Load Data in CSV Format from Amazon S3 Using a Pipeline
On this page
SingleStore Pipelines can read data from a comma-separated value (CSV) formatted file stored in an S3 bucket and insert these values into a destination table.
Prerequisites
The following prerequisites need to be in place before creating a pipeline using a CSV file:
-
A CSV file is available for upload to an S3 bucket.
-
An S3 bucket has been configured and has the proper permissions for access.
See Load Data from Amazon Web Services (AWS) S3 for more details about Amazon S3 Pipelines.
Create a CSV Pipeline
The following examples use the same dataset and table structure.
Create a Table
Create a table using the following syntax:
CREATE TABLE nautical_books(isbn text(13) CHARACTER SET utf8 COLLATE utf8_general_ci,title text(50) CHARACTER SET utf8 COLLATE utf8_general_ci,author text (50) CHARACTER SET utf8 COLLATE utf8_general_ci,year_published text(4) CHARACTER SET utf8 COLLATE utf8_general_ci);
The dataset used in this example is below.
isbn, book_title,Book_author,Year9780393972832,Moby Dick,Herman Melville,18519780006166269,Master and Commander,Patrick O’Brian,19699780192823182,The Riddle of the Sands,Erskine Childers,19039781421834016,Captain Blood,Rafael Sabatini,19229783104026886,Mr Midshipman Hornblower,CS Forester,19509780192829313,The Sea Wolf,Jack London,19049781857027204,The Perfect Storm,Sebastian Junger,19979780297866374,The Spanish Armada,Garrett Mattingly,19599788496957879,Mutiny on the Bounty,Charles Nordoff,19329780375757945,Two Years Before the Mast,Richard Henry Dana,18409780340203163,Shogun,James Clavell,19759787201046440,The Old Man and The Sea,Ernest Hemingway,19529780721405971,Treasure Island,Robert Louis Stevenson,18839780006172765,The Hunt for Red October,Tom Clancy,19849780316137492,The Terror,Dan Simmons,2007
Create a Pipeline with Minimal Information
Creating a pipeline using only the following information will result in the data not loading, and the table will be empty.
CREATE PIPELINE books ASLOAD DATA S3 's3://test-bucket/nautical_books.csv'CONFIG '{"region":"us-west-2"}'CREDENTIALS '{"aws_access_key_id": "XXXXXXXXXXXXXXX","aws_secret_access_key": "XXXXXXXXXXXXXXX"}'INTO TABLE nautical_books;START PIPELINE books;SELECT * FROM nautical_books;
Empty set (0.13 sec)
Troubleshooting Pipelines
The following command provides valuable information about why a running pipeline is not working as expected.
SELECT * FROM information_schema.pipelines_errorsWHERE pipeline_name = 'books';
+--------------+--------------+------------------------+------------+-------------+--------------------------------------------------------------------+------------+
|DATABASE_NAME |PIPELINE_NAME | ERROR_UNIX_TIMESTAMP | ERROR_TYPE | ERROR_CODE | ERROR_MESSAGE | ERROR_KIND |
+--------------+---------------------------------------+------------+-------------+--------------------------------------------------------------------+------------+
| tickets | books | 1675962208 | Warning | 2568 | Detected input line ending in '\r\n', | Load |
| | | | | | but FORMAT CSV is configured with LINES | |
| | | | | | TERMINATED BY '\n'. So '\r' will be included | |
| | | | | | in loaded data. Consider adding LINES TERMINATED BY '\r\n' if '\r' | |
| | | | | | should be part of the line terminator and stripped from the data. | |
| tickets | books | 1675962208 | Error | 1261 | Row 1 doesn't contain data for all columns | Load |
+--------------+--------------+------------------------+------------+-------------+--------------------------------------------------------------------+------------+
Note
The table above has been truncated for viewing purposes.
Create Pipeline with Escape Clauses
Creating a pipeline with the escape clauses will allow the data to load.
CREATE PIPELINE books ASLOAD DATA S3 's3://test-bucket/nautical_books.csv'CONFIG '{"region":"us-west-2"}'CREDENTIALS '{"aws_access_key_id": "XXXXXXXXXX","aws_secret_access_key": "XXXXXXXXXX"}'INTO TABLE nautical_booksFIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'LINES TERMINATED BY '\r\n' STARTING BY ''START PIPELINE books;SELECT * FROM nautical_books;
+---------------+---------------------------+------------------------+----------------+
| isbn | title | author | year_published |
+---------------+---------------------------+------------------------+----------------+
| 9788496957879 | Mutiny on the Bounty | Charles Nordoff | 1932 |
| 9781421834016 | Captain Blood | Rafael Sabatini | 1922 |
| 9780006172765 | The Hunt for Red October | Tom Clancy | 1984 |
| 9787201046440 | The Old Man and The Sea | Ernest Hemingway | 1952 |
| 9780316137492 | The Terror | Dan Simmons | 2007 |
| 9780375757945 | Two Years Before the Mast | Richard Henry Dana | 1840 |
| 9780393972832 | Moby Dick | Herman Melville | 1851 |
| 9780006166269 | Master and Commander | Patrick O’Brian | 1969 |
| 9780192823182 | The Riddle of the Sands | Erskine Childers | 1903 |
| 9783104026886 | Mr Midshipman Hornblower | CS Forester | 1950 |
| 9780721405971 | Treasure Island | Robert Louis Stevenson | 1883 |
| 9781857027204 | The Perfect Storm | Sebastian Junger | 1997 |
| 9780192829313 | The Sea Wolf | Jack London | 1904 |
| 9780340203163 | Shogun | James Clavell | 1975 |
| 9780297866374 | The Spanish Armada | Garrett Mattingly | 1959 |
+---------------+---------------------------+------------------------+----------------+
Create Pipeline Using the Ignore Clause
Add an IGNORE <#> LINES
clause in the pipeline creation to skip over the headers or a set number of lines at the beginning of the file.
CREATE PIPELINE books ASLOAD DATA S3 's3://test-bucket/nautical_books.csv'CONFIG '{"region":"us-west-2"}'CREDENTIALS '{"aws_access_key_id": "XXXXXXXXXX","aws_secret_access_key": "XXXXXXXXXX"}'INTO TABLE nautical_booksFIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'LINES TERMINATED BY '\r\n' STARTING BY ''IGNORE 1 LINES
+---------------+---------------------------+------------------------+----------------+
| isbn | title | author | year_published |
+---------------+---------------------------+------------------------+----------------+
| 9780006172765 | The Hunt for Red October | Tom Clancy | 1984 |
| 9783104026886 | Mr Midshipman Hornblower | CS Forester | 1950 |
| 9780721405971 | Treasure Island | Robert Louis Stevenson | 1883 |
| 9780316137492 | The Terror | Dan Simmons | 2007 |
| 9780375757945 | Two Years Before the Mast | Richard Henry Dana | 1840 |
| 9780340203163 | Shogun | James Clavell | 1975 |
| 9780192823182 | The Riddle of the Sands | Erskine Childers | 1903 |
| 9781421834016 | Captain Blood | Rafael Sabatini | 1922 |
| 9781857027204 | The Perfect Storm | Sebastian Junger | 1997 |
| 9788496957879 | Mutiny on the Bounty | Charles Nordoff | 1932 |
| 9787201046440 | The Old Man and The Sea | Ernest Hemingway | 1952 |
| 9780393972832 | Moby Dick | Herman Melville | 1851 |
| 9780297866374 | The Spanish Armada | Garrett Mattingly | 1959 |
| 9780006166269 | Master and Commander | Patrick O’Brian | 1969 |
| 9780192829313 | The Sea Wolf | Jack London | 1904 |
+---------------+---------------------------+------------------------+----------------+
Related Topics
Last modified: August 29, 2024