Load Data in JSON Format from Amazon S3 Using a Wildcard
On this page
SingleStore Pipelines can read data from JSON formatted files stored in an S3 bucket by using a wildcard (*).
Prerequisites
The following prerequisites need to be in place before creating a pipeline using a JSON wildcard.
- 
        Two or more JSON files for upload to an S3 bucket. 
- 
        Access to the S3 bucket. 
Create a Table
Create a table using the following syntax:
CREATE TABLE employees (lastname TEXT CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,firstname TEXT CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,age INT NOT NULL,DOB TEXT CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,partner TEXT CHARACTER SET utf8 COLLATE utf8_general_ci,hasChildren boolean,children JSON COLLATE utf8_bin,SHARD KEY ());
The JSON file contents are below.
Contents of emp1:
{"lastname": "Bateman","firstname": "Patrick","DOB" : "October 23, 1982","age": "40","partner": "Sharon Bateman","hasChildren": true,"children":["Isabell", "Jackson"]}{"lastname": "Denbrough","firstname": "Bill","DOB" : "January 04, 1990","age": "33","partner": "Audra Phillips","hasChildren": true,"children":["Stephen", "Maya", "Hunter"]}{"lastname": "Torance","firstname": "Jack","DOB" : "June 20, 1980","age": "43","partner": "Wendy Torrance","hasChildren": true,"children":["Danny", "Lucy"]}{"lastname": "Strode","firstname": "Laurie","DOB" : "February 22, 1987","age": "36","partner": null,"hasChildren": true,"children":["Jamie Lloyd", "John Tate", "Karen Nelson"]}{"lastname": "Whateley","firstname": "Wilbur","DOB" : "February 02, 1970","age": "53","partner": null,"hasChildren": true,"children":["Wilbur Jr.", "Eleanor"]}
Contents of emp2:
{"lastname": "Karras","firstname": "Damien","DOB" : "April 12, 1973","age": "50","partner": null,"hasChildren": false,"children": null}{"lastname": "Wilkes","firstname": "Annie","DOB" : "April 01, 1972","age": "51","partner":null,"hasChildren": false,"children": null}{"lastname": "Cady","firstname": "Max","DOB" : "September 13, 1989","age": "33","partner": null,"hasChildren": false,"children": null}{"lastname": "White","firstname": "Carrie","DOB" : "August 25, 2000","age": "22","partner": null,"hasChildren": false,"children": null}{"lastname": "MacNeil","firstname": "Regan","DOB" : "April 07, 1985","age": "38","partner": "Orlando Knotts","hasChildren": false,"children": null}{"lastname": "Eriksson","firstname": "Oskar","DOB" : "May 29, 2001","age": "22","partner": null,"hasChildren": false,"children": null}{"lastname": "Grevers","firstname": "Nick","DOB" : "November 21, 1995","age": "28","partner" : "Sam Avery","hasChildren": false,"children": null}
Create a Pipeline to the S3 Bucket
Make sure to add the correct pipeline name, bucket name, folder name (if applicable), and AWS credentials.
CREATE PIPELINE emp_dataAS LOAD DATA S3 '<bucket_name>/<folder_name>/*.json'CONFIG '{"region":"us-west-2"}'CREDENTIALS '{"aws_access_key_id": "<xxxxxxxxxxxxxxx>","aws_secret_access_key": "<xxxxxxxxxxxxxxx>"}'BATCH_INTERVAL 2500MAX_PARTITIONS_PER_BATCH 1DISABLE OUT_OF_ORDER OPTIMIZATIONDISABLE OFFSETS METADATA GCSKIP DUPLICATE KEY ERRORSINTO TABLE employeesFORMAT JSON(`lastname` <- `lastname` default '',`firstname` <- `firstname` default '',`age` <- `age` default -1,`DOB` <- `DOB` default -1,`partner` <- `partner` default '',`hasChildren` <- `hasChildren` default '',`children` <- `children` default '');
Start the pipeline.
START PIPELINE emp_data;
Verify the data has loaded.
SELECT * FROM employees;
+-----------+-----------+-----+--------------------+----------------+-------------+--------------------------------------------+
| lastname  | firstname | age | DOB                | partner        | hasChildren | children                                   |
+-----------+-----------+-----+--------------------+----------------+-------------+--------------------------------------------+
| Bateman   | Patrick   |  40 | October 23, 1982   | Sharon Bateman |           1 | ["Isabell","Jackson"]                      |
| Eriksson  | Oskar     |  22 | May 29, 2001       | NULL           |           0 | NULL                                       |
| MacNeil   | Regan     |  38 | April 07, 1985     | Orlando Knotts |           0 | NULL                                       |
| Grevers   | Nick      |  28 | November 21, 1995  | Sam Avery      |           0 | NULL                                       |
| Karras    | Damien    |  50 | April 12, 1973     | NULL           |           0 | NULL                                       |
| Whateley  | Wilbur    |  53 | February 02, 1970  | NULL           |           1 | ["Wilbur Jr.","Eleanor"]                   |
| Denbrough | Bill      |  33 | January 04, 1990   | Audra Phillips |           1 | ["Stephen","Maya","Hunter"]                |
| Torance   | Jack      |  43 | June 20, 1980      | Wendy Torrance |           1 | ["Danny","Lucy"]                           |
| Wilkes    | Annie     |  51 | April 01, 1972     | NULL           |           0 | NULL                                       |
| Cady      | Max       |  33 | September 13, 1989 | NULL           |           0 | NULL                                       |
| White     | Carrie    |  22 | August 25, 2000    | NULL           |           0 | NULL                                       |
| Strode    | Laurie    |  36 | February 22, 1987  | NULL           |           1 | ["Jamie Lloyd","John Tate","Karen Nelson"] |
+-----------+-----------+-----+--------------------+----------------+-------------+--------------------------------------------+Other Useful Pipeline Commands
To troubleshoot an empty table run the command below be sure to add in the correct pipeline name.
SELECT * FROM information_schema.pipelines_errorsWHERE pipeline_name = '<pipeline_name>';
Stopping a pipeline is useful when you want to stop additional data from loading but would like to use the pipeline again.
STOP PIPELINE <pipeline_name>;
Dropping a pipeline is useful when the pipeline was created for a one-off data load and it will not be used in the future.
DROP PIPELINE <pipeline_name>;
Note
A pipeline does not need to be stopped in order to be dropped.
Last modified: September 27, 2023