# Load Data from Parquet Files Parquet formatted files can be loaded from a local file or by using a pipeline. The basic syntax is provided below: ## Load Parquet Files from a Local Filesystem Parquet formatted data stored on the local filesystem can be loaded using a `LOAD DATA` query. This streamlines the process of loading cloud-stored data into tables. Other `LOAD DATA` clauses (`SET`, `WHERE`, etc.) are supported (but not shown) in the following syntax examples. Syntax for loading a Parquet file: ```sql LOAD DATA INFILE '' INTO TABLE (val1 <- source1, val2 <- source2 [ ... ] ) [COMPRESSION { AUTO | NONE | LZ4 | GZIP }] [ ... ] FORMAT PARQUET; ``` ## Example: Loading Parquet Files Create a table: ```sql CREATE TABLE assets( Product_ID INT, Category VARCHAR(50), Model VARCHAR(50), Price DECIMAL(6,2), Employee_ID NUMERIC(5) ); ``` Create the LOAD DATA statement: Verify the file's location and the column names are correct. ```sql LOAD DATA INFILE '/tmp/assets.parquet' INTO TABLE assets (Product_ID <- Product_ID, Category <- Category, Model <- Model, Price <- Price, Employee_ID <- Employee_ID) FORMAT PARQUET; ``` Verify the data loaded: ```sql SELECT * FROM ASSETS; ``` ```output +------------+------------+-----------------+---------+-------------+ | Product_ID | Category | Model | Price | Employee_ID | +------------+------------+-----------------+---------+-------------+ | 58 | cell_phone | iphone_12 | 879.00 | 410 | | 1049 | laptop | mac_book_pro | 1999.00 | 14 | | 1100 | laptop | mac_book_pro | 1999.00 | 298 | | 210 | laptop | mac_book_pro | 2500.00 | 312 | | 130 | cell_phone | iphone_13 | 699.00 | 110 | | 99 | laptop | mac_book_air_M1 | 999.00 | 75 | | 140 | cell_phone | iphone_13_pro | 999.00 | 263 | | 49 | cell_phone | iphone_12 | 879.00 | 102 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | 399 | | 111 | laptop | mac_book_pro | 2100.00 | 89 | +------------+------------+-----------------+---------+-------------+ ``` > **❗ Important**: Column names must be specified and they are case-sensitive. ## Load Parquet Files Using a Pipeline Parquet formatted data stored in an AWS S3 bucket can be loaded using a `LOAD DATA` query with a pipeline. Pipeline syntax: ```sql LOAD DATA S3 '' CONFIG '{"region" : ""}' CREDENTIALS '{"aws_access_key_id" : " ", "aws_secret_access_key": ""}' INTO TABLE (`` <- %, `` <- % DEFAULT NULL , ) FORMAT PARQUET; ``` ## Example: Parquet Pipeline This section will provide an example of creating a SQL table, a Parquet file for import, and a Parquet formatted pipeline. Create a table with the following structure: ```sql CREATE TABLE employees( ID NUMERIC(5), Last_Name VARCHAR(25), First_Name VARCHAR(25), Job_Title VARCHAR(25), Department VARCHAR(25), City VARCHAR(25), State VARCHAR(2), Email VARCHAR(50) ); ``` Install the `pandas` and `pyarrow` libraries for Parquet file support: ```python pip install pandas pyarrow ``` Create a file with data based on the table structure, be sure to save it as a Python file and use the correct path for the output: ```python import pandas as pd # Provided data data = [ ('014', 'Bateman', 'Patrick', 'Prod_Mgr', 'prod_dev', 'NYC', 'NY', 'pbateman@worldtech.com'), ('102', 'Karras', 'Damien', 'Doctor', 'R&D', 'NYC', 'NY', 'dkarras@worldtech.com'), ('298', 'Denbrough', 'Bill', 'Salesperson', 'Sales', 'Bangor', 'ME', 'bdenbrough@worldtech.com'), ('399', 'Torrance', 'Jack', 'PR Dir', 'PR', 'Estes Park', 'CO', 'jtorrance@worldtech.com'), ('410', 'Wilkes', 'Annie', 'HR Mgr', 'HR', 'Silver Creek', 'CO', 'awilkes@worldtech.com'), ('110', 'Strode', 'Laurie', 'VP Sales', 'Sales', 'Haddonfield', 'IL', 'lsatrode@worldtech.com'), ('312', 'Cady', 'Max', 'IT Dir', 'IT', 'New Essex', 'FL', 'mcady@worldtech.com'), ('089', 'Whateley', 'Wilbur', 'CEO', 'Sen_Mgmt', 'Dunwich', 'MA', 'wwhateley@worldtech.com'), ('075', 'White', 'Carrie', 'Receptionist', 'HR', 'Chamberlain', 'ME', 'cwhite@worldtech.com'), ('263', 'MacNeil', 'Regan', 'R&D Mgr', 'R&D', 'Washington', 'DC', 'rmacneil@worldtech.com') ] # Define column names columns = ['ID', 'Last_Name', 'First_Name', 'Job_Title', 'Department', 'City', 'State', 'Email'] # Create a DataFrame df = pd.DataFrame(data, columns=columns) # Save DataFrame to Parquet file df.to_parquet('//', index=False) ``` Once the file is saved, change the directory to the location of the Python script and run the script to generate the Parquet file: ```bash cd / ``` ```bash python .py ``` Upload the newly created Parquet file up to an S3 bucket and create the pipeline: ```sql CREATE PIPELINE emp_list AS LOAD DATA S3 's3:///employee_data.parquet' CONFIG '{"region":"us-west-2"}' CREDENTIALS '{"aws_access_key_id": "XXXXXXXXXX", "aws_secret_access_key": "XXXXXXXXXX"}' INTO TABLE employees (`ID` <- ID, `Last_Name` <- Last_Name, `First_Name` <- First_Name, `Job_Title` <- Job_Title, `Department` <- Department, `City` <- City, `State` <- State, `Email` <- Email) FORMAT PARQUET; ``` Start the pipeline: ```sql START PIPELINE emp_list; ``` Verify the data loaded correctly: ```sql SELECT * FROM employees; ``` ```output +------+-----------+------------+--------------+------------+--------------+-------+--------------------------+ | ID | Last_Name | First_Name | Job_Title | Department | City | State | Email | +------+-----------+------------+--------------+------------+--------------+-------+--------------------------+ | 263 | MacNeil | Regan | R&D Mgr | R&D | Washington | DC | rmacneil@worldtech.com | | 312 | Cady | Max | IT Dir | IT | New Essex | FL | mcady@worldtech.com | | 89 | Whateley | Wilbur | CEO | Sen_Mgmt | Dunwich | MA | wwhateley@worldtech.com | | 102 | Karras | Damien | Doctor | R&D | NYC | NY | dkarras@worldtech.com | | 14 | Bateman | Patrick | Prod_Mgr | prod_dev | NYC | NY | pbateman@worldtech.com | | 410 | Wilkes | Annie | HR Mgr | HR | Silver Creek | CO | awilkes@worldtech.com | | 75 | White | Carrie | Receptionist | HR | Chamberlain | ME | cwhite@worldtech.com | | 110 | Strode | Laurie | VP Sales | Sales | Haddonfield | IL | lsatrode@worldtech.com | | 298 | Denbrough | Bill | Salesperson | Sales | Bangor | ME | bdenbrough@worldtech.com | | 399 | Torrance | Jack | PR Dir | PR | Estes Park | CO | jtorrance@worldtech.com | +------+-----------+------------+--------------+------------+--------------+-------+--------------------------+ ``` > **❗ Important**: Column names must be specified and they are case-sensitive. ## In this section * [Create a Parquet Pipeline](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-parquet-files/create-a-parquet-pipeline.md) * [Load Parquet Data using LOAD DATA](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-parquet-files/load-parquet-data-using-load-data.md) *** Modified at: March 26, 2025 Source: [/db/v9.1/load-data/load-data-from-files/load-data-from-parquet-files/](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-parquet-files/) (An index of the documentation is available at /llms.txt)