Stage

Stage is a storage service that helps you organize and manage local files for ingestion into your SingleStore Helios database(s). Each workspace group has a Stage where you can create folders and upload files. Stage is also supported in the Shared Edition (starter workspaces). Through Stage, you can also save query results into files.

Note

The workspace group must be running SingleStore version 8.1 or later.

Manage a Stage

You can manage files and folders in a Stage using any of the following:

  • Cloud Portal UI

  • Management API

  • Notebooks

  • SingleStore Python Client

Using the Cloud Portal

Upload a File

To upload a file in a Stage:

  1. Select Deployments -> <your_workspace_group> -> Stage -> Upload File(s).

  2. In the Upload File(s) dialog, either drag and drop a file to the dialog or select Browse Files.

  3. Once the file is loaded, select Upload File.

Create a Folder

  1. Select Deployments -> <your_workspace_group> -> Stage.

  2. Select the Create Folder button.

  3. In the Create Folder dialog, enter a name for the folder, and select Create Folder.

Using the Management API

Use the Stage path (/v1/stage endpoint) in the Management API to manage files and folders in a Stage. Refer to Management API and Management API Reference for more information.

For example, the following API call lists all the files and folders in the Stage attached to the workspace group with the specified ID:

curl -X 'GET' \
'https://api.singlestore.com/v1/stage/68af2f46-0000-1000-9000-3f6f5365d878/fs/' \
-H 'accept: application/json'

Using Notebooks or SingleStore Python Client

The SingleStore Python SDK supports the Stage object, which can be used to manage files and folders in a Stage. You can also use the Stage object (including other objects in the SingleStore Python SDK) in a Notebook. Refer to the SingleStore Python Client and SingleStore Python SDK API Reference for more information.

For example, the following code snippet uploads a file named data.csv to a Stage attached to a workspace group named examplewsg:

from singlestoredb import manage_workspaces
mgr = manage_workspaces('access_key_token_for_the_Management_API')
wg = mgr.workspace_groups['examplewsg']
wg.stage.upload_file('/filepath/data.csv', '/data.csv')

Ingest a File using Stage

Files can be ingested into a database from a Stage using the Cloud Portal or a pipeline.

Using the Cloud Portal

  1. Under Stage, select the ellipsis (three dots) in the Actions column of the file to upload, and then select Load To Database.

  2. In the Load Data dialog, from the Choose Workspace list, select a workspace.

  3. From the Choose a database list, select a database.

  4. In the Table box, select an existing table or enter a new table name.

  5. Select the Generate Notebook button. A notebook is created, which shows the breakdown of all the queries that may be loaded with the notebook.

    You may edit the queries in the notebook to include different column names, column types, etc.

  6. Select Run > Run All Cells.

  7. Run the Check that the data has loaded cell to verify the loaded data.

Using the LOAD DATA command

Create a table with a structure that can store data from the file. Use the following LOAD DATA syntax to load a file from a stage:

LOAD DATA STAGE 'path_in_stage/filename.extension'
INTO TABLE <table_name>
[FORMAT {JSON | AVRO | CSV}];

Refer to LOAD DATA for a complete syntax and related information.

The following example loads data from a CSV file from a Stage:

LOAD DATA STAGE 'simple.csv'
INTO TABLE simple_data
FIELDS TERMINATED BY ','
IGNORE 1 LINES;

Note

LOAD DATA STAGE command is not supported in the Shared Edition.

Using Pipelines

Create a table with a structure that can store the data from the file. Use the following CREATE PIPELINE syntax to load a file from a Stage:

CREATE PIPELINE <pipeline_name>
AS LOAD DATA STAGE <path_in_Stage/filename> { <pipeline_options> }
INTO TABLE <table_name>
{ <data_format_options> }

Once the table and pipeline are created, start the pipeline. Refer to CREATE PIPELINE for the complete syntax and related information.

Here's a sample CREATE PIPELINE statement that loads data from a CSV file:

CREATE PIPELINE dbTest.plTest
AS LOAD DATA STAGE 'data.csv'
BATCH_INTERVAL 2500
SKIP DUPLICATE KEY ERRORS
INTO TABLE t1
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY ''
FORMAT CSV;

Export SQL Results to a Stage

SQL results may be exported to a Stage as follows:

SELECT * FROM <table_name> GROUP BY 1 INTO STAGE '<table_results.csv>';

Use the GROUP BY 1 clause to avoid getting multiple files from each leaf node.

Supported Files

The Stage storage service supports the following file formats:

CSV

SQL

JSON

Parquet

GZ

Zstd

Snappy

Storage Limits

Each Stage can have up to 10GB of storage for free. Individual files must not exceed 5GB in size.

Last modified: November 12, 2024

Was this article helpful?