Load Data from Parquet Files
On this page
Parquet formatted files can be loaded from a local file or by using a pipeline.
Load Parquet Files from a Local Filesystem
Parquet formatted data stored on the local filesystem can be loaded using a LOAD DATA query.LOAD DATA clauses (SET, WHERE, etc.
Syntax for loading a Parquet file:
LOAD DATA INFILE '<path_to_file/file_name>'INTO TABLE <table_name>(val1 <- source1,val2 <- source2[ ... ]) [COMPRESSION { AUTO | NONE | LZ4 | GZIP }][ ... ]FORMAT PARQUET;
Example: Loading Parquet Files
Create a table:
CREATE TABLE assets(Product_ID INT,Category VARCHAR(50),Model VARCHAR(50),Price DECIMAL(6,2),Employee_ID NUMERIC(5));
Create the LOAD DATA statement:
Verify the file's location and the column names are correct.
LOAD DATA INFILE '/tmp/assets.parquet'INTO TABLE assets(Product_ID <- Product_ID,Category <- Category,Model <- Model,Price <- Price,Employee_ID <- Employee_ID)FORMAT PARQUET;
Verify the data loaded:
SELECT * FROM ASSETS;
+------------+------------+-----------------+---------+-------------+
| Product_ID | Category   | Model           | Price   | Employee_ID |
+------------+------------+-----------------+---------+-------------+
|         58 | cell_phone | iphone_12       |  879.00 |         410 |
|       1049 | laptop     | mac_book_pro    | 1999.00 |          14 |
|       1100 | laptop     | mac_book_pro    | 1999.00 |         298 |
|        210 | laptop     | mac_book_pro    | 2500.00 |         312 |
|        130 | cell_phone | iphone_13       |  699.00 |         110 |
|         99 | laptop     | mac_book_air_M1 |  999.00 |          75 |
|        140 | cell_phone | iphone_13_pro   |  999.00 |         263 |
|         49 | cell_phone | iphone_12       |  879.00 |         102 |
|       2037 | laptop     | mac_book_air_M2 | 1199.00 |         399 |
|        111 | laptop     | mac_book_pro    | 2100.00 |          89 |
+------------+------------+-----------------+---------+-------------+Important
Column names must be specified and they are case-sensitive.
Load Parquet Files Using a Pipeline
Parquet formatted data stored in an AWS S3 bucket can be loaded using a LOAD DATA query with a pipeline.
Pipeline syntax:
LOAD DATA S3 '<bucket name>'CONFIG '{"region" : "<region_name>"}'CREDENTIALS '{"aws_access_key_id" : "<key_id> ","aws_secret_access_key": "<access_key>"}'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT PARQUET;
Example: Parquet Pipeline
This section will provide an example of creating a SQL table, a Parquet file for import, and a Parquet formatted pipeline.
Create a table with the following structure:
CREATE TABLE employees(ID NUMERIC(5),Last_Name VARCHAR(25),First_Name VARCHAR(25),Job_Title VARCHAR(25),Department VARCHAR(25),City VARCHAR(25),State VARCHAR(2),Email VARCHAR(50));
Install the pandas and pyarrow libraries for Parquet file support:
pip install pandas pyarrow
Create a file with data based on the table structure, be sure to save it as a Python file and use the correct path for the output:
import pandas as pd# Provided datadata = [('014', 'Bateman', 'Patrick', 'Prod_Mgr', 'prod_dev', 'NYC', 'NY', 'pbateman@worldtech.com'),('102', 'Karras', 'Damien', 'Doctor', 'R&D', 'NYC', 'NY', 'dkarras@worldtech.com'),('298', 'Denbrough', 'Bill', 'Salesperson', 'Sales', 'Bangor', 'ME', 'bdenbrough@worldtech.com'),('399', 'Torrance', 'Jack', 'PR Dir', 'PR', 'Estes Park', 'CO', 'jtorrance@worldtech.com'),('410', 'Wilkes', 'Annie', 'HR Mgr', 'HR', 'Silver Creek', 'CO', 'awilkes@worldtech.com'),('110', 'Strode', 'Laurie', 'VP Sales', 'Sales', 'Haddonfield', 'IL', 'lsatrode@worldtech.com'),('312', 'Cady', 'Max', 'IT Dir', 'IT', 'New Essex', 'FL', 'mcady@worldtech.com'),('089', 'Whateley', 'Wilbur', 'CEO', 'Sen_Mgmt', 'Dunwich', 'MA', 'wwhateley@worldtech.com'),('075', 'White', 'Carrie', 'Receptionist', 'HR', 'Chamberlain', 'ME', 'cwhite@worldtech.com'),('263', 'MacNeil', 'Regan', 'R&D Mgr', 'R&D', 'Washington', 'DC', 'rmacneil@worldtech.com')]# Define column namescolumns = ['ID', 'Last_Name', 'First_Name', 'Job_Title', 'Department', 'City', 'State', 'Email']# Create a DataFramedf = pd.DataFrame(data, columns=columns)# Save DataFrame to Parquet filedf.to_parquet('/<path to file>/<file_name.parquet>', index=False)
Once the file is saved, change the directory to the location of the Python script and run the script to generate the Parquet file:
cd /<path/to/python-script/directory>
python <create_parquet_file>.py
Upload the newly created Parquet file up to an S3 bucket and create the pipeline:
CREATE PIPELINE emp_list AS LOAD DATA S3 's3://<path to file>/employee_data.parquet'CONFIG '{"region":"us-west-2"}' CREDENTIALS '{"aws_access_key_id": "XXXXXXXXXX","aws_secret_access_key": "XXXXXXXXXX"}'INTO TABLE employees(`ID` <- ID,`Last_Name` <- Last_Name,`First_Name` <- First_Name,`Job_Title` <- Job_Title,`Department` <- Department,`City` <- City,`State` <- State,`Email` <- Email)FORMAT PARQUET;
Start the pipeline:
START PIPELINE emp_list;
Verify the data loaded correctly:
SELECT * FROM employees;
+------+-----------+------------+--------------+------------+--------------+-------+--------------------------+
| ID   | Last_Name | First_Name | Job_Title    | Department | City         | State | Email                    |
+------+-----------+------------+--------------+------------+--------------+-------+--------------------------+
|  263 | MacNeil   | Regan      | R&D Mgr      | R&D        | Washington   | DC    | rmacneil@worldtech.com   |
|  312 | Cady      | Max        | IT Dir       | IT         | New Essex    | FL    | mcady@worldtech.com      |
|   89 | Whateley  | Wilbur     | CEO          | Sen_Mgmt   | Dunwich      | MA    | wwhateley@worldtech.com  |
|  102 | Karras    | Damien     | Doctor       | R&D        | NYC          | NY    | dkarras@worldtech.com    |
|   14 | Bateman   | Patrick    | Prod_Mgr     | prod_dev   | NYC          | NY    | pbateman@worldtech.com   |
|  410 | Wilkes    | Annie      | HR Mgr       | HR         | Silver Creek | CO    | awilkes@worldtech.com    |
|   75 | White     | Carrie     | Receptionist | HR         | Chamberlain  | ME    | cwhite@worldtech.com     |
|  110 | Strode    | Laurie     | VP Sales     | Sales      | Haddonfield  | IL    | lsatrode@worldtech.com   |
|  298 | Denbrough | Bill       | Salesperson  | Sales      | Bangor       | ME    | bdenbrough@worldtech.com |
|  399 | Torrance  | Jack       | PR Dir       | PR         | Estes Park   | CO    | jtorrance@worldtech.com  |
+------+-----------+------------+--------------+------------+--------------+-------+--------------------------+Important
Column names must be specified and they are case-sensitive.
In this section
Last modified: March 26, 2025