ML Functions
On this page
Overview
Machine learning (ML) functions enable trained models to be run directly within SQL queries.
Introduction to Machine Learning
Machine learning is a field of study in artificial intelligence that develops and applies methods for learning patterns from historical data and using those patterns to make predictions or decisions on new data.
Classification
Classification is a supervised learning technique that assigns each input to one of a predefined set of classes or labels.
Following are the types of classification:
-
Binary Classification: Predicts one of two possible outcomes (for example, fraud vs.
non-fraud). -
Multiclass Classification: Predicts one label from multiple possible categories (for example, product type A, B, or C).
-
Multilabel Classification: Assigns multiple labels to a single data point (for example, tagging an image with "beach" and "sunset").
Examples:
-
A credit-card transaction classified as “fraudulent” or “legitimate.
” -
Customer support tickets categorized as “billing,” “technical issue,” or “account upgrade.
”
The ML_ is a supervised machine learning function for classification tasks.ML_ function and return predicted class labels.
Anomaly Detection
Anomaly detection identifies data points that deviate significantly from expected patterns.
-
Performance issues (for example, server overload)
-
System faults (for example, failed jobs or memory leaks)
-
Opportunities (for example, traffic spikes caused by a marketing campaign)
For example, if a cluster's CPU usage normally stays between 20–60% and suddenly rises to 95%, the spike is an anomaly.
Time-Series Anomaly Detection
Time-series anomaly detection analyzes data collected over time.
Following are the types of time-series anomaly detection:
-
Supervised: Supervised models use labeled anomalies to learn failure patterns.
-
Unsupervised: Unsupervised models learn normal behavior from historical data and flag deviations without labels.
The ML_ function is an unsupervised time-series anomaly detection function currently.
Install ML Functions
To install ML Functions, navigate to AI > AI & ML Functions, select the deployment on which to install ML Functions.
Once the ML Functions are installed, query them in the SQL Editor or SingleStore Notebooks.
|
Category |
Function |
|---|---|
|
Statistical and Predictive Functions |
|
|
Statistical and Predictive Functions
ML_ CLASSIFY
Performs binary and multi-class classification on a dataset using standard machine learning algorithms.
-
Logistic Regression
-
Random Forest
-
Gradient Boosting
Syntax
ML_CLASSIFY(model_name, TO_JSON(selected_data.*))
Arguments
-
model_: Name of the trained ML model to use.name -
selected_: A row or set of rows selected for prediction.data
Return Type
string
Usage
|
Basic usage |
|
|
Basic usage with |
|
|
Insert predictions into a table |
|
ML_ ANOMALY_ DETECT
Detects outliers and anomalies in datasets using statistical or machine learning-based methods.
-
Statistical: z-score, interquartile range (IQR)
-
ML-based: Isolation Forest, One-Class SVM
Syntax
ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*))
Arguments
-
model_: Name of the trained ML model to use.name -
selected_: A row or set of rows selected for prediction.data
Return Type
string
Usage
|
Basic usage |
|
|
Basic usage with |
|
|
Insert predictions into a table |
|
Train a New ML Model
To train a new ML model, follow these steps:
-
Navigate to AI > Models.
-
Select ML Models tab and then select Train New ML Model.
-
In the Select Function dialog, select one of the following ML functions:
-
ML_CLASSIFY -
ML_ANOMALY_ DETECT
Select Next to configure the model.
-
Configure Model
|
Model Name |
Enter the name of the ML model. |
|
Training Description |
Enter the training description. |
|
Workspace |
Select the SingleStore deployment (workspace) the notebook connects to. Specifying a workspace allows natively connecting the SingleStore databases referenced in the notebook. |
|
Compute Size |
Select one of the following compute sizes:
|
|
Run as |
Run the notebook for training a model with or without personal credentials.
|
Select Next.
Select Training Data
|
Database |
Select the database that contains the training data. |
|
Table |
Select the table from the selected database to train the machine learning model. |
|
Target Column |
Select the column that represents the prediction target for the model. |
|
Feature Selection Mode |
Specify how feature columns are selected. |
|
Feature Column |
Select one or more columns to be used as input features for training the model. |
Preview the data and select Next.
Review the Summary and generated Fusion SQL syntax in the Generated SQL Script.
-
Creates and trains a ML model
-
Uses data from the selected table in the selected database
-
Predicts values of target column status
-
Runs on the selected compute instance
-
Uses all available features by default
Following is the syntax of Fusion SQL script:
%s2ml train <machine_learning_algorithm>--model <model_name>--db <database_name>--input_table <table_name>--target_column <target_column>--description <training_description>--runtime <compute_instance>--selected_features { \"mode\": <feature_selection_mode>, \"features\": <feature_column> }
Select Start Training to train the ML model.
Example Notebooks
The following notebooks demonstrates how to use ML Functions:
ML Functions: Classification
Demonstrate ML function Classify
Note
You can use your existing Standard or Premium workspace with this Notebook.
This feature is currently in Private Preview. Please reach out to support@singlestore.com to confirm if this feature can be enabled in your org.
This Jupyter notebook will help you:
- Load the titanic dataset
- Store the data in a SingleStore table
- Use ML Functions for training and predictions
- Run some common Data Analysis tasks
Prerequisites: Ensure ML Functions are installed on your deployment (AI > AI & ML Functions).
%%sql-- Ensure that ML_CLASSIFY is listed in Functions_in_cluster columnshow functions in cluster;
!pip install -q httplib2 seaborn pandas numpy scikit-learn
Load and Prepare the Titanic Dataset
We'll use the famous Titanic dataset from seaborn, which contains passenger information from the RMS Titanic. The goal is to predict whether a passenger survived based on features like age, sex, ticket class, and fare.
%%sqlCREATE DATABASE IF NOT EXISTS temp;USE temp;
import seaborn as snsimport pandas as pdimport numpy as npfrom sklearn.model_selection import train_test_split# Load the Titanic datasettitanic_df = sns.load_dataset('titanic')# Display basic informationprint(f"Dataset shape: {titanic_df.shape}")print(f"\nColumn names: {list(titanic_df.columns)}")print(f"\nFirst 5 rows:")print(titanic_df.head())# Check survival distributionprint(f"\nSurvival Distribution:")print(titanic_df['survived'].value_counts())
Clean and Prepare Features
We'll select the most important features and handle missing values to create a clean dataset for training.
# Select relevant columns for predictioncolumns_to_use = ['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked']titanic_clean = titanic_df[columns_to_use].copy()# Fill missing valuestitanic_clean['age'] = titanic_clean['age'].fillna(titanic_clean['age'].median())titanic_clean['fare'] = titanic_clean['fare'].fillna(titanic_clean['fare'].median())titanic_clean['embarked'] = titanic_clean['embarked'].fillna('S') # Most common port# Drop any remaining rows with missing valuestitanic_clean = titanic_clean.dropna()# Convert survived to text labels for classificationtitanic_clean['survival_status'] = titanic_clean['survived'].map({0: 'Died',1: 'Survived'})# Drop the original numeric survived columntitanic_clean = titanic_clean.drop('survived', axis=1)print(f"Clean dataset shape: {titanic_clean.shape}")print(f"\nMissing values per column:")print(titanic_clean.isnull().sum())print(f"\nSurvival status distribution:")print(titanic_clean['survival_status'].value_counts())print(f"\nFirst 5 rows of clean data:")print(titanic_clean.head())
Split Data into Training and Test Sets
We'll split the data into 80% training and 20% test sets to evaluate model performance.
# Split into train (80%) and test (20%) setstrain_df, test_df = train_test_split(titanic_clean,test_size=0.2,random_state=42,stratify=titanic_clean['survival_status'])print(f"Training set size: {len(train_df)} passengers")print(f"Test set size: {len(test_df)} passengers")print(f"\nTraining set survival distribution:")print(train_df['survival_status'].value_counts())print(f"\nTest set survival distribution:")print(test_df['survival_status'].value_counts())
%%sqlDROP TABLE IF EXISTS titanic_training_data;DROP TABLE IF EXISTS titanic_test_data;DROP TABLE IF EXISTS titanic_predictions;CREATE TABLE titanic_training_data (pclass INT,sex VARCHAR(10),age FLOAT,sibsp INT,parch INT,fare FLOAT,embarked VARCHAR(1),survival_status VARCHAR(10));CREATE TABLE titanic_test_data (pclass INT,sex VARCHAR(10),age FLOAT,sibsp INT,parch INT,fare FLOAT,embarked VARCHAR(1),survival_status VARCHAR(10));CREATE TABLE titanic_predictions (pclass INT,sex VARCHAR(10),age FLOAT,sibsp INT,parch INT,fare FLOAT,embarked VARCHAR(1),actual_status VARCHAR(10),predicted_status JSON);
Load Data into SingleStore Tables
We'll use pandas to insert the training and test data into our SingleStore tables.
import singlestoredb as s2# Create engine with database specifiedengine = s2.create_engine(database='temp')# Insert training datatrain_df.to_sql('titanic_training_data',con=engine,if_exists='append',index=False,method='multi')# Insert test datatest_df.to_sql('titanic_test_data',con=engine,if_exists='append',index=False,method='multi')print(f"Inserted {len(train_df)} rows into titanic_training_data")print(f"Inserted {len(test_df)} rows into titanic_test_data")
Verify Data Load
Let's verify that our data was loaded correctly and review the passenger demographics.
%%sqlSELECT COUNT(*) as training_count FROM titanic_training_data;
%%sqlSELECT COUNT(*) as test_count FROM titanic_test_data;
%%sqlSELECTsurvival_status,COUNT(*) as passenger_count,ROUND(AVG(age), 1) as avg_age,ROUND(AVG(fare), 2) as avg_fareFROM titanic_training_dataGROUP BY survival_status;SELECT * FROM titanic_training_data LIMIT 5;
Train the ML Classification Model
Now we'll train an ML model using the %s2ml train magic command. This will use SingleStore's ML Functions to train a classification model that predicts passenger survival.
Note: Training may take several minutes depending on the compute size selected. The model will learn patterns like "women and children first" and the impact of ticket class on survival.
%%s2ml train as training_resulttask: classificationmodel: titanic_survival_predictordb: tempinput_table: titanic_training_datatarget_column: survival_statusdescription: "Titanic passenger survival prediction based on demographics and ticket info"runtime: cpu-smallselected_features: {"mode":"*","features":null}
Check Training Results
The training result is assigned to the variable training_result. Let's examine the training details.
# Display the training resulttraining_result
Monitor Training Status
Use the %s2ml status command to view the model details and training status. The status will be one of: Pre-processing, Training, Completed, or Error.
%s2ml status --model titanic_survival_predictor
Note
Wait for training to complete before proceeding to the next section
You can re-run the cell above to check the status. Once the pipeline_status shows "Ready", you can proceed with predictions.
Run Sample Predictions
Once training is complete, let's run predictions on a few sample passengers from our test dataset to see how the model performs. Ensure that you still are using the right database selected
%%sqlUse temp;
%%sqlSELECTcluster.ML_CLASSIFY('titanic_survival_predictor', TO_JSON(passenger.*)) as predicted_status,passenger.survival_status as actual_status,passenger.pclass as ticket_class,passenger.sex,passenger.age,passenger.fareFROM (SELECT * FROM titanic_test_data LIMIT 10) AS passenger;
Run Predictions on Full Test Dataset
Now let's run predictions on the entire test dataset and store the results in our predictions table.
%%sqlINSERT INTO titanic_predictions (pclass, sex, age, sibsp, parch, fare, embarked,actual_status, predicted_status)SELECTpassenger.pclass,passenger.sex,passenger.age,passenger.sibsp,passenger.parch,passenger.fare,passenger.embarked,passenger.survival_status as actual_status,cluster.ML_CLASSIFY('titanic_survival_predictor', TO_JSON(passenger.*)) as predicted_statusFROM titanic_test_data AS passenger;
Evaluate Model Performance
Let's analyze the prediction accuracy by comparing actual vs predicted survival status.
%%sqlSELECTCOUNT(*) as total_predictions,SUM(CASE WHEN actual_status = JSON_EXTRACT_STRING(predicted_status, 'predicted_label') THEN 1 ELSE 0 END) as correct_predictions,ROUND(100.0 * SUM(CASE WHEN actual_status = JSON_EXTRACT_STRING(predicted_status, 'predicted_label') THEN 1 ELSE 0 END) / COUNT(*), 2) as accuracy_percentageFROM titanic_predictions;
Analyze Survival Factors
Let's examine how different passenger characteristics influenced survival predictions.
%%sql-- Survival rate by sexSELECTsex,COUNT(*) as total_passengers,SUM(CASE WHEN actual_status = 'Survived' THEN 1 ELSE 0 END) as actual_survivors,ROUND(100.0 * SUM(CASE WHEN actual_status = 'Survived' THEN 1 ELSE 0 END) / COUNT(*), 1) as survival_rate_pctFROM titanic_predictionsGROUP BY sexORDER BY survival_rate_pct DESC;
%%sql-- Survival rate by passenger classSELECTpclass as ticket_class,COUNT(*) as total_passengers,SUM(CASE WHEN actual_status = 'Survived' THEN 1 ELSE 0 END) as actual_survivors,ROUND(100.0 * SUM(CASE WHEN actual_status = 'Survived' THEN 1 ELSE 0 END) / COUNT(*), 1) as survival_rate_pct,ROUND(AVG(fare), 2) as avg_fare_paidFROM titanic_predictionsGROUP BY pclassORDER BY pclass;
Examine Misclassified Passengers
Let's look at passengers where the model made incorrect predictions to understand potential model limitations.
%%sqlSELECTactual_status,JSON_EXTRACT_STRING(predicted_status, 'predicted_label') as predicted_label,JSON_EXTRACT_DOUBLE(predicted_status, 'confidence') as confidence,pclass as ticket_class,sex,age,sibsp as siblings_spouses,parch as parents_children,fare,embarkedFROM titanic_predictionsWHERE actual_status != JSON_EXTRACT_STRING(predicted_status, 'predicted_label')LIMIT 15;
Cleanup
%%sqlDROP TABLE IF EXISTS titanic_training_data;DROP TABLE IF EXISTS titanic_test_data;DROP TABLE IF EXISTS titanic_predictions;DROP DATABASE IF EXISTS temp;

ML Functions: Anomaly Detection
Demonstrate ML function Anomaly Detect
Note
You can use your existing Standard or Premium workspace with this Notebook.
This feature is currently in Private Preview. Please reach out to support@singlestore.com to confirm if this feature can be enabled in your org.
This Jupyter notebook will help you:
- Download the bank transaction dataset from kaggle for anomaly detection
- Store the data in a SingleStore table
- Use ML Functions for training and predictions
- Visualize the results of anomaly detection on test dataset
Prerequisites: Ensure ML Functions are installed on your deployment (AI > AI & ML Functions).
Step 1: Import necessary Libraries
pip install -q kagglehub
import osimport pandas as pdimport singlestoredb as s2import jsonimport kagglehubimport getpassfrom singlestoredb import create_enginefrom IPython.display import displayimport plotly.graph_objects as gofrom plotly.subplots import make_subplots
Step 2: Test Connection to SingleStore
# Ensure that you have selected a SinglestoreDB connection (workspace) before running this celltry:conn = s2.connect()conn.autocommit(True) # Set autocommit for notebook simplicityprint("Connection successful!")except Exception as e:print(f"Connection failed: {e}")
Step 3: Create Database and Tables
# Create the databasedb_name = "temp"try:conn.cursor().execute(f"CREATE DATABASE IF NOT EXISTS {db_name};")conn.cursor().execute(f"USE {db_name};")print(f"Database '{db_name}' is ready.")except Exception as e:print(e)
Step 4: Prepare and Load Data
We will use a publicly available Kaggle Bank Transactions dataset. We'll download the data into Pandas DataFrames and then load them into SingleStore.
Note
You may need to whitelist your Firewall to allow this notebook instance to access the Kaggle server url. In case the notebook doesn't have access, a Toast message will appear guiding you to add this url when running the following code cell.
print("Downloading dataset from Kaggle...")path = kagglehub.dataset_download("valakhorasani/bank-transaction-dataset-for-fraud-detection")print(f"Dataset downloaded to: {path}")# Read the CSV filedf = pd.read_csv(f"{path}/bank_transactions_data_2.csv")df["TransactionDate"] = pd.to_datetime(df["TransactionDate"], format="%Y-%m-%d %H:%M:%S", errors="coerce")df["PreviousTransactionDate"] = pd.to_datetime(df["PreviousTransactionDate"], format="%Y-%m-%d %H:%M:%S", errors="coerce")# Display dataset infoprint(f"\nDataset shape: {df.shape}")print(f"Columns: {list(df.columns)}")print("\nFirst few rows:")df.head()
try:# Create a SQLAlchemy engineengine = s2.create_engine(database=db_name)# Use pandas.to_sql to load the complete data to a tabledf.to_sql("metrics", con=engine, if_exists='append', index=False)print("Data loading complete.")except Exception as e:print(f"Error loading data: {e}")
Create Train Test splits from complete data
Ensure that the rows belonging to the same sequence are grouped in the same split. The following SQL query takes care of this split
%%sqlUSE "temp";DROP TABLE IF EXISTS training_data;DROP TABLE IF EXISTS test_data;-- Training setCREATE TABLE training_data ASSELECT *FROM (SELECT*,ROW_NUMBER() OVER (ORDER BY TransactionDate) AS rn,COUNT(*) OVER () AS total_rowsFROM temp.metrics) tWHERE rn <= 0.8 * total_rowsORDER BY TransactionDate;-- Test setCREATE TABLE test_data ASSELECT *FROM (SELECT*,ROW_NUMBER() OVER (ORDER BY TransactionDate) AS rn,COUNT(*) OVER () AS total_rowsFROM temp.metrics) tWHERE rn > 0.8 * total_rowsORDER BY TransactionDate;
<span style="color: green">2009 rows affected.</span>Pre-process train data set to include recent context rows
- Combine the last N (e.g., 20) most recent rows from
train_datawith all rows fromtest_data, forming a continuous chronological dataset. - The last N rows help provide recent context for model evaluation.
- The most recent 20 rows are then excluded from the final result since they are used only for observing short-term trends.
%%sqlUSE temp;-- Drop old context table if it existsDROP TABLE IF EXISTS test_data_with_history;-- Create a new test_data_with_history table:-- includes the last 20 rows from training_data + all rows from test_dataCREATE TABLE test_data_with_history ASSELECT *FROM (SELECT *FROM (SELECT *FROM training_dataORDER BY TransactionDate DESCLIMIT 20) recent_trainingUNION ALLSELECT *FROM test_data)ORDER BY TransactionDate ASC;
<span style="color: green">523 rows affected.</span>Step 5: Train the Anomaly Detection Model
Now we'll use the %%s2ml train cell magic to train our AnomalyDetection model.
%s2ml list
Optionally delete any previous model
%s2ml delete --model cc_anomaly_model_v1 --f
{'status': 'deleted',
'modelID': '174365d5-0135-47af-bf4f-2a582a03b5ba',
'model': 'cc_anomaly_model_v1'}%%s2ml train as anomaly_modeltask: AnomalyDetectionmodel: cc_anomaly_model_v1db: tempinput_table: training_datatarget_column: TransactionAmounttarget_time_column: TransactionDatetarget_series_column: Channeldescription: "Training a anomaly detection model"runtime: cpu-smallselected_features: {"mode":"*","features":null}force: True
Step 6: Check Model Status
Model training is an asynchronous job. We can use the %s2ml status magic to check on its progress. The model is ready to use when the status shows Completed.
(This may take a minute or two. Re-run the cell to refresh the status.)
%s2ml status --model cc_anomaly_model_v1
print(json.dumps(anomaly_model, indent=4))
Step 7: Run Predictions
Once the model is COMPLETED, you can use it directly in SQL via the ML_ANOMALY_DETECT() function.
Here is an example prediction for one row
%%sqlshow tables in temp;
<span style="color: green">4 rows affected.</span>Run the model on a subset of data to view the results format
%%sqlSELECT COUNT(*) as test_count FROM test_data_with_history;
<span style="color: green">1 rows affected.</span>%%sqlSELECT * FROM test_data_with_history ORDER BY TransactionDate ASC LIMIT 10;
<span style="color: green">10 rows affected.</span>%%sqlSELECT cluster.ML_ANOMALY_DETECT('cc_anomaly_model_v1', TO_JSON(selected_data.*)) AS is_anomalyFROM (SELECT * FROM temp.test_data_with_history ORDER BY TransactionDate ASC) AS selected_data;
<span style="color: green">1025 rows affected.</span>Step 8: Visualize Anomalies
# Run Predictionscursor = conn.cursor();test_data_count = f"""SELECT count(*) FROM temp.test_data;"""cursor.execute(test_data_count)QUERY_LIMIT = cursor.fetchall()[0][0] # Number of records to analyze
# Run Predictionspredictions = f"""SELECT cluster.ML_ANOMALY_DETECT('cc_anomaly_model_v1', TO_JSON(selected_data.*)) AS is_anomalyFROM (SELECT * FROM temp.test_data_with_history ORDER BY TransactionDate ASC) AS selected_data;"""cursor.execute(predictions)# Parse and prepare data for visualizationcolumn_names = [desc[0] for desc in cursor.description]df = pd.DataFrame(list(cursor.fetchall()), columns=column_names)print(f"✓ Retrieved {len(df)} records from database")df.head()
# ============================================# AI generated code to visualize Predictions# ============================================SERIES_COLORS = {'ATM': '#1f77b4','Branch': '#2ca02c','Online': '#9467bd','POS': '#d62728','Mobile': '#ff7f0e'}DEFAULT_COLOR = '#8c564b'# Parse JSON and filter out error rowsdef safe_parse_json(x):try:parsed = json.loads(x)# Check if it's an error responseif 'error' in parsed:return Nonereturn parsedexcept:return Nonedf['is_anomaly_parsed'] = df['is_anomaly'].apply(safe_parse_json)# Filter out rows that failed to parse or had errorsdf_valid = df[df['is_anomaly_parsed'].notna()].copy()print(f"✓ Filtered out {len(df) - len(df_valid)} error/invalid rows")print(f"✓ Processing {len(df_valid)} valid prediction rows")if len(df_valid) == 0:print("❌ No valid predictions found. All rows contain errors.")print("\nSample error:")print(df['is_anomaly'].iloc[0])else:# Check what keys are in the JSONprint(f"✓ JSON keys found: {list(df_valid['is_anomaly_parsed'].iloc[0].keys())}")# Expand JSON into separate columnsdf_expanded = pd.json_normalize(df_valid['is_anomaly_parsed'])print(f"✓ Expanded columns: {list(df_expanded.columns)}")# Combine dataframesdf_final = pd.concat([df_valid.reset_index(drop=True), df_expanded], axis=1)# Handle potential case sensitivity in column names# Check for different possible timestamp column namestimestamp_col = Nonefor possible_name in ['TS', 'ts', 'Ts', 'timestamp', 'Timestamp']:if possible_name in df_final.columns:timestamp_col = possible_namebreakif timestamp_col is None:print("Available columns:", list(df_final.columns))raise ValueError("Could not find timestamp column. Please check the column names above.")# Convert timestamp to datetimedf_final['TS'] = pd.to_datetime(df_final[timestamp_col])print("Before drop:", len(df_final))# --- DROP WARM-UP CONTEXT ROWS ---warmup_count = 20df_final = df_final.sort_values('TS').reset_index(drop=True)if len(df_final) > warmup_count:df_final = df_final.iloc[warmup_count:].reset_index(drop=True)print(f"✓ Ignored the first {warmup_count} warm-up rows used for context.")print("After drop:", len(df_final))# Handle potential case sensitivity for Series columnseries_col = Nonefor possible_name in ['Series', 'series', 'SERIES']:if possible_name in df_final.columns:series_col = possible_namebreakif series_col is None:raise ValueError("Could not find Series column")# Standardize column nameif series_col != 'Series':df_final['Series'] = df_final[series_col]# Filter out any null Series values before sortingdf_final = df_final[df_final['Series'].notna()].copy()# Handle potential case sensitivity for other columnscolumn_mapping = {'Actual': ['Actual', 'actual'],'Forecast': ['Forecast', 'forecast'],'Lower bound': ['Lower bound', 'lower bound', 'lower_bound'],'Upper bound': ['Upper bound', 'upper bound', 'upper_bound'],'is_Anomaly': ['is_Anomaly', 'is_anomaly', 'isAnomaly']}for standard_name, possible_names in column_mapping.items():for possible_name in possible_names:if possible_name in df_final.columns:if possible_name != standard_name:df_final[standard_name] = df_final[possible_name]breakprint(f"✓ Data preparation complete")# STEP 3: Dynamic series configurationseries_types = sorted([s for s in df_final['Series'].unique() if isinstance(s, str)])num_series = len(series_types)print(f"✓ Found {num_series} unique series types: {', '.join(series_types)}")subplot_titles = [f"{series} Transactions" for series in series_types]row_heights = [1.0 / num_series] * num_series# STEP 4: Create optimized Plotly visualizationfig = make_subplots(rows=num_series,cols=1,subplot_titles=tuple(subplot_titles),vertical_spacing=0.08,row_heights=row_heights)for idx, series_type in enumerate(series_types, start=1):series_data = df_final[df_final['Series'] == series_type].sort_values('TS')series_color = SERIES_COLORS.get(series_type, DEFAULT_COLOR)# Add confidence interval (upper bound)fig.add_trace(go.Scatter(x=series_data['TS'],y=series_data['Upper bound'],mode='lines',line=dict(width=0),showlegend=False,hoverinfo='skip',name='Upper Bound'), row=idx, col=1)# Add confidence interval (lower bound with fill)fig.add_trace(go.Scatter(x=series_data['TS'],y=series_data['Lower bound'],mode='lines',line=dict(width=0),fillcolor='rgba(68, 168, 129, 0.12)',fill='tonexty',showlegend=False,hoverinfo='skip',name='Lower Bound'), row=idx, col=1)# Add forecast linefig.add_trace(go.Scatter(x=series_data['TS'],y=series_data['Forecast'],mode='lines',name='Forecast',line=dict(color='gray', dash='dash', width=1.5),showlegend=(idx==1),hovertemplate='<b>Forecast</b><br>%{x}<br>$%{y:.2f}<extra></extra>'), row=idx, col=1)# Add actual values - Use Scattergl for performancefig.add_trace(go.Scattergl(x=series_data['TS'],y=series_data['Actual'],mode='lines',name='Actual',line=dict(color=series_color, width=2),showlegend=(idx==1),hovertemplate='<b>Actual</b><br>%{x}<br>$%{y:.2f}<extra></extra>'), row=idx, col=1)# Highlight anomaliesanomalies = series_data[series_data['is_Anomaly'] == True]if len(anomalies) > 0:fig.add_trace(go.Scatter(x=anomalies['TS'],y=anomalies['Actual'],mode='markers',name='Anomaly',marker=dict(size=12,color='red',symbol='x',line=dict(width=2, color='darkred')),showlegend=(idx==1),hovertemplate='<b>⚠️ ANOMALY</b><br>%{x}<br>$%{y:.2f}<extra></extra>'), row=idx, col=1)# Update axesfig.update_xaxes(title_text="Time", row=idx, col=1, showgrid=True, gridwidth=1, gridcolor='rgba(128,128,128,0.1)')fig.update_yaxes(title_text="Amount ($)", row=idx, col=1, showgrid=True, gridwidth=1, gridcolor='rgba(128,128,128,0.1)')# STEP 5: Optimized layoutplot_height = max(900, 280 * num_series)fig.update_layout(height=plot_height,width=1400,title={'text': f"Credit Card Transaction Anomaly Detection<br><sub>ML_ANOMALY_DETECT - {len(df_final):,} Records | {df_final['is_Anomaly'].sum()} Anomalies Detected</sub>",'x': 0.5,'xanchor': 'center','y': 0.99,'yanchor': 'top','font': {'size': 18}},hovermode='closest',showlegend=True,legend=dict(orientation="h",yanchor="top",y=-0.02,xanchor="center",x=0.5,bgcolor='rgba(255,255,255,0.9)',bordercolor='rgba(0,0,0,0.2)',borderwidth=1,font=dict(size=12)),template='plotly_white',margin=dict(l=80, r=40, t=100, b=80),)fig.show()# STEP 6: Enhanced summary statisticsprint("\n" + "="*75)print(f"{'ANOMALY DETECTION SUMMARY':^75}")print("="*75)print(f"Total records analyzed: {len(df_final):,}")print(f"Date range: {df_final['TS'].min().strftime('%Y-%m-%d')} to {df_final['TS'].max().strftime('%Y-%m-%d')}")print(f"Anomalies detected: {df_final['is_Anomaly'].sum()}")print(f"Overall anomaly rate: {(df_final['is_Anomaly'].sum() / len(df_final) * 100):.2f}%")print(f"\nUnique series types: {num_series}")print("\nBreakdown by Series:")print("-"*75)for series in series_types:series_df = df_final[df_final['Series'] == series]series_count = len(series_df)anomaly_count = series_df['is_Anomaly'].sum()anomaly_rate = (anomaly_count/series_count*100) if series_count > 0 else 0series_avg = series_df['Actual'].mean()series_max = series_df['Actual'].max()print(f" {series:12s}: {anomaly_count:3d}/{series_count:4d} anomalies ({anomaly_rate:5.2f}%) | "f"Avg: ${series_avg:7.2f} | Max: ${series_max:8.2f}")print("="*75)
Step 9: Cleanup
Run these following commands to cleanup all the created resources
%%sqlUSE temp;-- Drop all tables created during the demoDROP TABLE IF EXISTS results_table;DROP TABLE IF EXISTS predict_table;DROP TABLE IF EXISTS train_table;DROP TABLE IF EXISTS transactions;DROP TABLE IF EXISTS test_data_with_history;-- Drop the databaseDROP DATABASE IF EXISTS temp;
<span style="color: green">3 rows affected.</span>
Manage an Existing ML Model
Existing ML models can be managed by performing the following actions:
-
View details
-
Run prediction
-
Share
-
Delete
View Details of an Existing ML Model
To view details of an existing ML model, select the ellipsis under Actions column of the trained ML model, and select View Details.
Run Prediction on an Existing ML Model
Run batch prediction on the existing ML model.
Run a Batch Prediction
To run a batch prediction on the existing ML model, select the ellipsis under Actions column of the trained ML model, and select Run Prediction.
Select Prediction Data
|
Database |
Select the database. |
|
Target Table |
Select the target table on which the prediction will be run. |
|
Target Column |
Select the target column on which the prediction will focus on. |
|
Timestamp Column |
Select the column having timestamp data. |
Preview the data and select Next.
Configure Destination
|
Prediction Interval Width |
Select the interval width of prediction. |
|
Destination Table Name |
Select the destination table in which the prediction results will be stored. |
|
Destination Column |
Select the destination column in which the prediction data will be saved. |
|
Run as |
Run the notebook for training a model with or without personal credentials.
|
Review the Summary and generated Fusion SQL syntax in the Generated SQL Script.
View Predictions of an Existing ML Model
To view the predictions of the trained ML model, select the ML model in the Name column.
Share an Existing ML Model
To share an existing ML model, select the ellipsis under the Actions column of the trained ML model, and select Share.
Delete an Existing ML Model
To delete an existing ML model, select the ellipsis under Actions column of the trained ML model, and select Delete.
Status of ML Models
|
Status |
Description |
|---|---|
|
Pre-processing |
The system is preparing data for ML model training (e. |
|
Training |
The ML model is currently being trained but results are not yet available. |
|
Done |
The ML model has been successfully trained and is ready for use. |
|
Error |
The ML model training or processing failed due to an error. |
In this section
Last modified: