Load Data from Parquet Files
Warning
SingleStore 9.0 gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 8.9 is recommended for production workloads, which can later be upgraded to SingleStore 9.0.
On this page
Parquet formatted files can be loaded from a local file or by using a pipeline.
Load Parquet Files from a Local Filesystem
Parquet formatted data stored on the local filesystem can be loaded using a LOAD DATA
query.LOAD DATA
clauses (SET
, WHERE
, etc.
Syntax for loading a Parquet file:
LOAD DATA INFILE '<path_to_file/file_name>'INTO TABLE <table_name>(val1 <- source1,val2 <- source2[ ... ]) [COMPRESSION { AUTO | NONE | LZ4 | GZIP }][ ... ]FORMAT PARQUET;
Example: Loading Parquet Files
Create a table:
CREATE TABLE assets(Product_ID INT,Category VARCHAR(50),Model VARCHAR(50),Price DECIMAL(6,2),Employee_ID NUMERIC(5));
Create the LOAD DATA statement:
Verify the file's location and the column names are correct.
LOAD DATA INFILE '/tmp/assets.parquet'INTO TABLE assets(Product_ID <- Product_ID,Category <- Category,Model <- Model,Price <- Price,Employee_ID <- Employee_ID)FORMAT PARQUET;
Verify the data loaded:
SELECT * FROM ASSETS;
+------------+------------+-----------------+---------+-------------+
| Product_ID | Category | Model | Price | Employee_ID |
+------------+------------+-----------------+---------+-------------+
| 58 | cell_phone | iphone_12 | 879.00 | 410 |
| 1049 | laptop | mac_book_pro | 1999.00 | 14 |
| 1100 | laptop | mac_book_pro | 1999.00 | 298 |
| 210 | laptop | mac_book_pro | 2500.00 | 312 |
| 130 | cell_phone | iphone_13 | 699.00 | 110 |
| 99 | laptop | mac_book_air_M1 | 999.00 | 75 |
| 140 | cell_phone | iphone_13_pro | 999.00 | 263 |
| 49 | cell_phone | iphone_12 | 879.00 | 102 |
| 2037 | laptop | mac_book_air_M2 | 1199.00 | 399 |
| 111 | laptop | mac_book_pro | 2100.00 | 89 |
+------------+------------+-----------------+---------+-------------+
Important
Column names must be specified and they are case-sensitive.
Load Parquet Files Using a Pipeline
Parquet formatted data stored in an AWS S3 bucket can be loaded using a LOAD DATA
query with a pipeline.
Pipeline syntax:
LOAD DATA S3 '<bucket name>'CONFIG '{"region" : "<region_name>"}'CREDENTIALS '{"aws_access_key_id" : "<key_id> ","aws_secret_access_key": "<access_key>"}'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT PARQUET;
Example: Parquet Pipeline
This section will provide an example of creating a SQL table, a Parquet file for import, and a Parquet formatted pipeline.
Create a table with the following structure:
CREATE TABLE employees(ID NUMERIC(5),Last_Name VARCHAR(25),First_Name VARCHAR(25),Job_Title VARCHAR(25),Department VARCHAR(25),City VARCHAR(25),State VARCHAR(2),Email VARCHAR(50));
Install the pandas
and pyarrow
libraries for Parquet file support:
pip install pandas pyarrow
Create a file with data based on the table structure, be sure to save it as a Python file and use the correct path for the output:
import pandas as pd# Provided datadata = [('014', 'Bateman', 'Patrick', 'Prod_Mgr', 'prod_dev', 'NYC', 'NY', 'pbateman@worldtech.com'),('102', 'Karras', 'Damien', 'Doctor', 'R&D', 'NYC', 'NY', 'dkarras@worldtech.com'),('298', 'Denbrough', 'Bill', 'Salesperson', 'Sales', 'Bangor', 'ME', 'bdenbrough@worldtech.com'),('399', 'Torrance', 'Jack', 'PR Dir', 'PR', 'Estes Park', 'CO', 'jtorrance@worldtech.com'),('410', 'Wilkes', 'Annie', 'HR Mgr', 'HR', 'Silver Creek', 'CO', 'awilkes@worldtech.com'),('110', 'Strode', 'Laurie', 'VP Sales', 'Sales', 'Haddonfield', 'IL', 'lsatrode@worldtech.com'),('312', 'Cady', 'Max', 'IT Dir', 'IT', 'New Essex', 'FL', 'mcady@worldtech.com'),('089', 'Whateley', 'Wilbur', 'CEO', 'Sen_Mgmt', 'Dunwich', 'MA', 'wwhateley@worldtech.com'),('075', 'White', 'Carrie', 'Receptionist', 'HR', 'Chamberlain', 'ME', 'cwhite@worldtech.com'),('263', 'MacNeil', 'Regan', 'R&D Mgr', 'R&D', 'Washington', 'DC', 'rmacneil@worldtech.com')]# Define column namescolumns = ['ID', 'Last_Name', 'First_Name', 'Job_Title', 'Department', 'City', 'State', 'Email']# Create a DataFramedf = pd.DataFrame(data, columns=columns)# Save DataFrame to Parquet filedf.to_parquet('/<path to file>/<file_name.parquet>', index=False)
Once the file is saved, change the directory to the location of the Python script and run the script to generate the Parquet file:
cd /<path/to/python-script/directory>
python <create_parquet_file>.py
Upload the newly created Parquet file up to an S3 bucket and create the pipeline:
CREATE PIPELINE emp_list AS LOAD DATA S3 's3://<path to file>/employee_data.parquet'CONFIG '{"region":"us-west-2"}' CREDENTIALS '{"aws_access_key_id": "XXXXXXXXXX","aws_secret_access_key": "XXXXXXXXXX"}'INTO TABLE employees(`ID` <- ID,`Last_Name` <- Last_Name,`First_Name` <- First_Name,`Job_Title` <- Job_Title,`Department` <- Department,`City` <- City,`State` <- State,`Email` <- Email)FORMAT PARQUET;
Start the pipeline:
START PIPELINE emp_list;
Verify the data loaded correctly:
SELECT * FROM employees;
+------+-----------+------------+--------------+------------+--------------+-------+--------------------------+
| ID | Last_Name | First_Name | Job_Title | Department | City | State | Email |
+------+-----------+------------+--------------+------------+--------------+-------+--------------------------+
| 263 | MacNeil | Regan | R&D Mgr | R&D | Washington | DC | rmacneil@worldtech.com |
| 312 | Cady | Max | IT Dir | IT | New Essex | FL | mcady@worldtech.com |
| 89 | Whateley | Wilbur | CEO | Sen_Mgmt | Dunwich | MA | wwhateley@worldtech.com |
| 102 | Karras | Damien | Doctor | R&D | NYC | NY | dkarras@worldtech.com |
| 14 | Bateman | Patrick | Prod_Mgr | prod_dev | NYC | NY | pbateman@worldtech.com |
| 410 | Wilkes | Annie | HR Mgr | HR | Silver Creek | CO | awilkes@worldtech.com |
| 75 | White | Carrie | Receptionist | HR | Chamberlain | ME | cwhite@worldtech.com |
| 110 | Strode | Laurie | VP Sales | Sales | Haddonfield | IL | lsatrode@worldtech.com |
| 298 | Denbrough | Bill | Salesperson | Sales | Bangor | ME | bdenbrough@worldtech.com |
| 399 | Torrance | Jack | PR Dir | PR | Estes Park | CO | jtorrance@worldtech.com |
+------+-----------+------------+--------------+------------+--------------+-------+--------------------------+
Important
Column names must be specified and they are case-sensitive.
In this section
Last modified: March 26, 2025