# How to Bulk Load Vectors

SingleStore recommends loading larger vector data sets using a binary format, such as [Apache Parquet](https://parquet.apache.org/) or [Apache Avro™](https://avro.apache.org/), and using [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md) and [PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/create-pipeline.md)s. The benefits of using a binary format include:

* Faster load speed.

  * Loading from binary file formats uses less CPU than loading from JSON array of numbers format.
* Smaller input file size.

  * Vectors stored in binary format are typically much smaller than vectors stored in JSON array of numbers format.
* No loss of precision.

  * Using the JSON array of numbers format, you may lose precision due to rounding errors.

Thus, for high performance, SingleStore recommends using Parquet or Avro formats to load large vector data sets, if possible. If using a binary format is not feasible, SingleStore supports loading vectors encoded in hexadecimal, which is more efficient than JSON array format, but less efficient than Avro or Parquet.

SingleStore also supports input and output of vectors in JSON array of numbers format. This format is convenient and user-readable but has performance impacts.

Example 1 provides an end-to-end example of generating and loading binary vectors using Parquet file format and Python. Examples 2, 3, and 4 demonstrate loading vector data using Parquet, hexadecimal, and JSON array format, respectively. Using Avro is similar to using Parquet, except that the format should be specified as Avro and not Parquet. Example 5 shows exporting vector data using Parquet and hexadecimal encodings.

## Output Format for Examples

Vectors can be output in JSON or binary format. Use JSON format for examples and for output readability. For production, use the default binary for efficiency.

Use the following command to output vectors in JSON format.

```sql
SET vector_type_project_format = JSON;
```

Use the following command to set the output format back to binary.

```sql
SET vector_type_project_format = BINARY;
```

## Example 1 - Generate and Load Binary Vector Data in Parquet using Python

This example first generates a file containing binary vectors using [Parquet](https://parquet.apache.org/) file format. That file is loaded into a SingleStore table in two different ways: first, from S3 first using a [PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/create-pipeline.md) command and second, using a [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md) command.

## Generate a Parquet File containing Binary Vector Data

Vector data in Parquet file format can be generated using Python and [pyarrow](https://arrow.apache.org/docs/python/index.html).

Install the `pyarrow` library, using `pip3` or using an installation command appropriate to your system.

```shell
pip3 install -q pyarrow numpy
```

Start the Python interpreter.

```shell
python3
```

The Python code below will generate a Parquet file that matches the schema of the `comments` table with the data shown in the table.

```sql
CREATE TABLE comments(id INT,
   comment TEXT,
   comment_embedding VECTOR(4),
   category VARCHAR(256));
```

| id | comment                                              | comment\_embedding        | category |
| -- | ---------------------------------------------------- | ------------------------- | -------- |
| 1  | "The cafeteria in building 35 has a great salad bar" | \[0.45, 0.55, 0.495, 0.5] | "Food"   |
| 2  | "I love the taco bar in the B16 cafeteria."          | \[0.01, 0.01, 0.1, 0.99]  | "Food"   |
| 3  | "The B24 restaurant salad bar is quite good."        | \[0.1, 0.8, 0.2, 0.555]   | "Food"   |

Note: In this table, the vector data is shown as a JSON array for readability; in the example, the data will be generated and loaded in binary format.

Run the following code in the python interpreter. This code will create a file named `comments.parquet`. Names ending in `_pqt` are used for column names in the parquet file so that the parquet column names can be distinguished from SingleStore column names.

```python
import pyarrow as pa
import pyarrow.parquet as pq
import numpy as np

# Define column names for Parquet file
columns = ['Id_pqt', 'Comment_pqt', 'Comment_embedding_pqt', 'Category_pqt']

# create some example binary vectors
emb1 = np.array([0.45, 0.55, 0.495, 0.5], dtype=np.float32).tobytes()
emb2 = np.array([0.01, 0.01, 0.1, 0.99], dtype=np.float32).tobytes()
emb3 = np.array([0.1, 0.8, 0.2, 0.555], dtype=np.float32).tobytes()

# Create the data arrays 
ids = pa.array([1,2,3], type=pa.int8())
comments = pa.array(["The cafeteria in building 35 has a great salad bar", "I love the taco bar in the B16 cafeteria.", "The B24 restaurant salad bar is quite good."], type=pa.string())
binary_vectors = pa.array([emb1, emb2, emb3], type=pa.binary())
categories = pa.array(["Food","Food","Food"],type=pa.string())
table = pa.Table.from_arrays([ids,comments,binary_vectors,categories],columns) 
pq.write_table(table, 'comments.parquet')
```

## Load the Parquet File using a Pipeline

Create the comments table in SingleStore. The `CREATE TABLE` statement for the comments table is repeated below for convenience.

```sql
CREATE TABLE comments(id INT,
   comment TEXT,
   comment_embedding VECTOR(4),
   category VARCHAR(256));
```

Create a pipeline to load the `comments.parquet` file into the `comments` table.

The following pipeline accesses a publicly available bucket which contains the `comments.parquet` file and can be run as is (without credentials).

```sql
CREATE PIPELINE parquet_comments_pipe
AS LOAD DATA S3 's3://singlestore-docs-example-datasets/vecs/comments.parquet'
CONFIG '{"region":"us-east-1"}'
INTO TABLE comments
    (id <- Id_pqt,
    comment <- Comment_pqt,
    @var <- Comment_embedding_pqt,
    category <- Category_pqt)
FORMAT PARQUET
SET comment_embedding = @var:>BLOB:>VECTOR(4, F32);
```

The `SET` statement converts the binary vector from the parquet file into a SingleStore `VECTOR` type. The `SET` statement does not do data conversion; it casts the data to a `VECTOR(4, F32)`, validates that the number of elements is as expected, and that the data does not contain any NAN/INF values.

Run the pipeline and verify the import succeeded.

```sql
START PIPELINE parquet_comments_pipe FOREGROUND;

```

View the values in the table to confirm the import succeeded.

```sql
SET vector_type_project_format = JSON; /* to make vector output human-readable */

SELECT * 
FROM comments;

```

```output

*** 1. row ***
               id: 1
          comment: The cafeteria in building 35 has a great salad bar
comment_embedding: [0.449999988,0.550000012,0.495000005,0.5]
         category: Food
*** 2. row ***
               id: 3
          comment: The B24 restaurant salad bar is quite good.
comment_embedding: [0.100000001,0.800000012,0.200000003,0.555000007]
         category: Food
*** 3. row ***
               id: 2
          comment: I love the taco bar in the B16 cafeteria.
comment_embedding: [0.00999999978,0.00999999978,0.100000001,0.99000001]
         category: Food
```

## Load the Parquet File using LOAD DATA

Load the `comments.parquet` file into the `comments` table using a [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md) command. `LOAD DATA` may be used to load Parquet and Avro files from S3. The following command accesses a publicly available bucket which contains the `comments.parquet` file and can be run as is (without credentials).

```sql
LOAD DATA S3 's3://singlestore-docs-example-datasets/vecs/comments.parquet'
CONFIG '{"region":"us-east-1"}'
INTO TABLE comments
    (id <- Id_pqt,
    comment <- Comment_pqt,
    @emb <- Comment_embedding_pqt,
    category <- Category_pqt)
FORMAT PARQUET
SET comment_embedding = @emb:>BLOB:>VECTOR(4, F32);

```

View the values in the table to confirm the import succeeded.

```sql
SET vector_type_project_format = JSON; /* to make vector output human readable */

SELECT * 
FROM comments;

```

```output

*** 1. row ***
               id: 2
          comment: I love the taco bar in the B16 cafeteria.
comment_embedding: [0.00999999978,0.00999999978,0.100000001,0.99000001]
         category: Food
*** 2. row ***
               id: 1
          comment: The cafeteria in building 35 has a great salad bar
comment_embedding: [0.449999988,0.550000012,0.495000005,0.5]
         category: Food
*** 3. row ***
               id: 3
          comment: The B24 restaurant salad bar is quite good.
comment_embedding: [0.100000001,0.800000012,0.200000003,0.555000007]
         category: Food
```

The [LOAD DATA LOCAL](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data/#UUID-ad611e13-3393-3958-01c4-68bbdd3f9404.md) command may be used to load from a local file.

## Example 2 - Bulk Loading Vector Data using Parquet

SingleStore supports loading vector data from Parquet files. SingleStore recommends loading from binary formats such as Parquet or Avro when possible.&#x20;

Create a table with a column of type `VECTOR` with 4 elements (dimension 4) and an element type of 32-bit floating-point number (`F32`). Note that in practice vectors typically have significantly more dimensions than 4, vectors with dimension 4 are used for readability.

```sql
CREATE TABLE vectors(id INT, vec VECTOR(4, F32) NOT NULL);
```

This example uses the following data. The vectors are shown in JSON array format for readability.

```
1,'[0.45, 0.55, 0.495, 0.5]'
2,'[0.1, 0.8, 0.2, 0.555]'
3,'[-0.5, -0.03, -0.1, 0.86]'
4,'[0.5, 0.3, 0.807, 0.1]'
```

## Example 2a - Using LOAD DATA to Load Parquet Data from S3

The following command loads a parquet file stored in S3 to the `vectors` table. This example uses [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md) , which is more efficient than loading data using [INSERT](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/insert.md) statements. This command can be run as is because `s3://singlestore-docs-examples-datasets` is a public bucket.

```sql
LOAD DATA S3
's3://singlestore-docs-example-datasets/vecs/vec_f32_data.parquet'
CONFIG '{"region":"us-east-1"}'
CREDENTIALS '{}'
INTO TABLE vectors
FORMAT PARQUET
(id <- id, @v <-vec)
SET vec = @v:>BLOB:>VECTOR(4, F32);
```

Note the last line of this command which uses a set of casts to load Parquet data into the `VECTOR` attribute: `SET vec = @v:>BLOB:>VECTOR(4, F32)`. By default, `LOAD DATA` expects vector data to be in textual JSON array notation. The casts in the `SET` clause override this and allow more efficient assignment directly from binary data.

Run the following SQL statement to verify that the data was loaded.

```sql
SET vector_type_project_format = JSON;  /* to make vector output readable */

SELECT id, vec 
FROM vectors
ORDER BY id;


```

```output

+------+---------------------------------------------------+
| id   | vec                                               |
+------+---------------------------------------------------+
|    1 | [0.449999988,0.550000012,0.495000005,0.5]         |
|    2 | [0.100000001,0.800000012,0.200000003,0.555000007] |
|    3 | [-0.5,-0.0299999993,-0.100000001,0.860000014]     |
|    4 | [0.5,0.300000012,0.806999981,0.100000001]         |
+------+---------------------------------------------------+
```

The results do not exactly match the vector values listed at the beginning of this example because elements in the `VECTOR` data type are stored as floating-point numbers and the vector values are not perfectly representable in floating-point representation.

To experiment further and use your own data file, follow the instructions in the [Example 5 - Export Vector Data in Parquet and Hexadecimal](https://docs.singlestore.com/#section-idm4627199473104034253762196604.md) section.

## Example 2b - Using a Pipeline to Load Parquet Data from S3

You can also use a pipeline to load data from S3. This example is the same as Example 1a, except that a pipeline is used to load the data.

Create a pipeline to load the Parquet data into the `vectors` table.&#x20;

```sql
CREATE PIPELINE vec_pipeline_pqt AS
LOAD DATA S3
's3://singlestore-docs-example-datasets/vecs/vec_f32_data.parquet'
CONFIG '{"region":"us-east-1"}'
CREDENTIALS '{}'
INTO TABLE vectors
FORMAT PARQUET
(id <- id, @v <-vec)
SET vec = @v:>BLOB:>VECTOR(4, F32);
```

This command uses `CREATE PIPELINE` and will result in an error if the pipeline already exists. If you want to update a pipeline that exists and may be running, use the syntax `CREATE OR REPLACE PIPELINE`. Refer to [CREATE PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/create-pipeline.md) for more information.

Start the pipeline.

```sql
START PIPELINE vec_pipeline_pqt FOREGROUND;
```

The command above starts a pipeline in the `FOREGROUND` so that errors will be displayed in the client. If you are using pipelines that will run continuously, consider starting pipelines in the background by omitting the `FOREGROUND` keyword. Refer to [START PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/start-pipeline.md) for more information.

Once the pipeline has completed, verify that the data has been loaded.

```sql
SET vector_type_project_format = JSON;  /* to make vector output readable */

SELECT id, vec 
FROM vectors
ORDER BY id;


```

```output

+------+---------------------------------------------------+
| id   | vec                                               |
+------+---------------------------------------------------+
|    1 | [0.449999988,0.550000012,0.495000005,0.5]         |
|    2 | [0.100000001,0.800000012,0.200000003,0.555000007] |
|    3 | [-0.5,-0.0299999993,-0.100000001,0.860000014]     |
|    4 | [0.5,0.300000012,0.806999981,0.100000001]         |
+------+---------------------------------------------------+
```

Once a pipeline is no longer needed, it can be dropped.&#x20;

```sql
DROP PIPELINE vec_pipeline_pqt;
```

If you have started a pipeline in the background (by omitting the `FOREGROUND` keyword), you can stop the pipeline instead of, or before, dropping it. Stopping a pipeline allows the pipeline to be used later, dropping a pipeline removes the pipeline from the system. Stopping is applicable for background pipelines only. Refer to [STOP PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/stop-pipeline.md) and [DROP PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/drop-pipeline.md) for more information.

## Example 2c - Loading Data in Parquet Format from a Local File

Use the following SQL statement to load Parquet data from a local file into the `vectors` table. For this example, place the Parquet file in a location accessible to your SQL client process and replace the path, `'/tmp/vec_f32_data.parquet'`, with the path to your text file.

```sql
LOAD DATA LOCAL INFILE '/tmp/vec_f32_data.parquet'
INTO TABLE vectors
FORMAT PARQUET
(id <- id, @v <-vec)
SET vec = @v:>BLOB:>VECTOR(4, F32);
```

Refer to [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md) for more information on using `LOAD DATA LOCAL`.

To generate a Parquet file to use with this command, follow the instructions in the [Example 5 - Export Vector Data in Parquet and Hexadecimal](https://docs.singlestore.com/#section-idm4627199473104034253762196604.md) section.

## Example 3 - Bulk Loading Vector data using Hexadecimal

SingleStore supports loading vector data from CSV files containing the hexadecimal encodings of vectors. SingleStore recommends this option when writing vectors into a binary format like Parquet or Avro is inconvenient.

Data for the vectors from Example 1 in CSV format with the vectors encoded in hexadecimal is shown below. The examples below demonstrate how to load this  data into SingleStore.

```
1, 6666E63ECDCC0C3FA470FD3E0000003F
2, CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F
3, 000000BF8FC2F5BCCDCCCCBDF6285C3F
4, 0000003F9A99993E8D974E3FCDCCCC3D 
```

## Example 3a - Using a Pipeline to Load Data in Hexadecimal from S3

This example uses the `vectors` table created earlier, so before running this example, delete data from the `vectors` table to prevent duplicate records.

```sql
TRUNCATE vectors;
```

The following shows loading the data with vectors encoded in hexadecimal into the `vectors` table.

> **📝 Note**: For loading data in CSV format from S3, a pipeline must be used with `LOAD DATA`. `LOAD DATA S3` works without a pipeline only for Parquet or Avro formats.

The command below can be run as is because `s3://singlestore-docs-example-datasets/` is a public bucket.

```sql
CREATE PIPELINE vec_pipeline_hex AS
LOAD DATA S3
's3://singlestore-docs-example-datasets/vecs/vec_f32_data_hex.csv'
CONFIG '{"region":"us-east-1"}'
CREDENTIALS '{}'
INTO TABLE vectors
FORMAT CSV(id, @v)
SET vec = UNHEX(@v):>VECTOR(4,F32);
```

Note the last line of this command uses the [UNHEX](https://docs.singlestore.com/db/v9.1/reference/sql-reference/string-functions/unhex.md) function and a cast to load hexadecimal data into a `VECTOR` attribute.

`FORMAT CSV` is the default and is not required in this query. We include it for clarity and to differentiate from the query for loading Parquet data. In addition, note that the column labelling format for CSV, `(id, @v)` in this case, is different than the column labelling format for Parquet `(id ->- id, @v -> vec)` in the previous example. Refer to [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md) for more detail.

Start the pipeline.

```sql
START PIPELINE vec_pipeline_hex FOREGROUND;
```

Once the data load has completed, verify that the data was loaded using the following SQL.

```sql
SET vector_type_project_format = JSON;  /* to make vector output readable */

SELECT * 
FROM vectors
ORDER BY id;


```

```output

+------+---------------------------------------------------+
| id   | vec                                               |
+------+---------------------------------------------------+
|    1 | [0.449999988,0.550000012,0.495000005,0.5]         |
|    2 | [0.100000001,0.800000012,0.200000003,0.555000007] |
|    3 | [-0.5,-0.0299999993,-0.100000001,0.860000014]     |
|    4 | [0.5,0.300000012,0.806999981,0.100000001]         |
+------+---------------------------------------------------+
```

The results do not exactly match the vector values listed, because elements in the `VECTOR` data type are stored as floating-point numbers and the vector values are not perfectly representable in floating-point representation.

Once a pipeline is no longer needed it can be dropped.&#x20;

```sql
DROP PIPELINE vec_pipeline_hex;
```

If you have started a pipeline in the background (by omitting the `FOREGROUND` keyword), you can stop the pipeline instead of, or before, dropping it. Stopping a pipeline allows the pipeline to be used later, dropping a pipeline removes the pipeline from the system. Stopping is applicable for background pipelines only. Refer to [STOP PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/stop-pipeline.md) and [DROP PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/drop-pipeline.md) for more information.

To experiment with your own data files and bucket, refer to [Example 5 - Export Vector Data in Parquet and Hexadecimal](https://docs.singlestore.com/#section-idm4627199473104034253762196604.md) and [Example 6 - Generate Vector Data](https://docs.singlestore.com/#section-idm234545992427085.md).

## Example 3b - Loading Data in Hexadecimal from a Local File

Use the following SQL statement to load the data with vectors encoded in hexadecimal into the `vectors` table. For this example, place the hexadecimal data show at the beginning of this section in a local file accessible to your SQL client and replace `'/tmp/vec_f32_data_hex.csv'` with the path to your text file.

As with the example above, the command below uses UNHEX and a cast to load hexadecimal data into a `VECTOR` attribute.

```sql
LOAD DATA LOCAL INFILE '/tmp/vec_f32_data_hex.csv'
INTO TABLE vectors
FORMAT CSV
(id, @v)
SET vec = UNHEX(@v):>VECTOR(4,F32);

```

## Example 4 - Loading Vector Data in JSON Array Format

Create a table with an attribute of type `VECTOR` with 3 elements and element type 16-bit integer (`I16`).&#x20;

```sql
CREATE TABLE vectors_i16 (id INT,vec VECTOR(3, I16));
```

Sample data for the `vectors_i16` table in JSON array format is below.&#x20;

```
1,'[1,2,3]'
2,'[4,5,6]'
3,'[1,4,8]'
```

## Example 4a - Using a PIPELINE to Load Vector Data in JSON Format from S3

The `PIPELINE`  below loads this data into the `vectors_i16` table. The command can be run as is because `s3://singlestore-docs-example-datasets/` is a public bucket.

```sql
CREATE PIPELINE vec_i16_pipeline AS
LOAD DATA S3
's3://singlestore-docs-example-datasets/vecs/vec_i16_data.csv'
CONFIG '{"region":"us-east-1"}'
CREDENTIALS '{}'
INTO TABLE vectors_i16
FIELDS TERMINATED BY ','
ENCLOSED BY "'"
FORMAT CSV;
```

Start the pipeline.

```sql
START PIPELINE vec_i16_pipeline FOREGROUND;
```

Once the pipeline has completed, validate that the data loaded using the following SQL.

```sql
SET vector_type_project_format = JSON;  /* to make vector output readable */

SELECT vec FROM vectors_i16;

```

```output

+---------+
| vec     |
+---------+
| [1,2,3] |
| [4,5,6] |
| [1,4,8] |
+---------+
```

Once a pipeline is no longer needed it can be dropped.&#x20;

```sql
DROP PIPELINE vec_i16_pipeline;
```

If you have started a pipeline in the background (by omitting the `FOREGROUND` keyword), you can stop the pipeline instead of, or before, dropping it. Stopping a pipeline allows the pipeline to be used later, dropping a pipeline removes the pipeline from the system. Stopping is applicable for background pipelines only. Refer to [STOP PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/stop-pipeline.md) and [DROP PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/drop-pipeline.md) for more information.

## Example 4b - Loading data in JSON Format from a Local File

Use the following SQL to load the data with vectors in JSON array format into the `vectors_i16` table. For this example, place the data in a local file accessible to your SingleStore client and replace `'/tmp/vec_i16_data.csv'` with the path to your text file.

```sql
LOAD DATA LOCAL INFILE '/tmp/vec_i16_data.csv'
INTO TABLE vectors_i16
FIELDS TERMINATED BY ','
ENCLOSED BY "'"
FORMAT CSV;

```

Note that lines `FIELDS TERMINATED BY ','` and `ENCLOSED BY "''"` clause tells the database that the fields in the data file are separated by commas and that the fields may be enclosed with single quotes. This allows the database to load the data as formatted above.

## Example 5 - Export Vector Data in Parquet and Hexadecimal

To experiment further with loading data in Parquet and hexadecimal format, use these instructions in this section to generate a data file and place that file in cloud storage location such as S3. Then, run the pipelines above by by updating the S3 location and adding your config and credentials.

First create the `vectors` table and insert data using the commands below.

```sql
CREATE TABLE vectors (id int, vec VECTOR(4) not null);

INSERT INTO vectors VALUES (1, '[0.45, 0.55, 0.495, 0.5]');
INSERT INTO vectors VALUES (2, '[0.1, 0.8, 0.2, 0.555]');
INSERT INTO vectors VALUES (3, '[-0.5, -0.03, -0.1, 0.86]');
INSERT INTO vectors VALUES (4, '[0.5, 0.3, 0.807, 0.1]');
```

## Example 5a - Export Vector Data in Parquet Format

When exporting vector data in Parquet, or other binary format, set `vector_type_project_format` to `BINARY`.

To export into a Parquet file on S3, use the following command. You will need to replace the S3 URI and the configuration and credentials with your own values. Refer to [SELECT](https://docs.singlestore.com/db/v9.1/query-data/select.md) for additional details on AWS configuration and credentials in SingleStore.

```sql
SET vector_type_project_format = BINARY;

SELECT id, vec
FROM vectors
INTO S3 's3://vec_data_folder/vec_f32_data.parquet'
CONFIG '{"region":"us-west-2"}'
CREDENTIALS '{
     "aws_access_key_id":"your_access_key_id",
     "aws_secret_access_key":"your_secret_access_key",
     "aws_session_token":"your_session_token"}'
FORMAT PARQUET;
```

You can output data in Parquet format to a local file using the following command.

```sql
SET vector_type_project_format = BINARY;

SELECT id, vec 
FROM vectors
INTO outfile "/tmp/vec_f32_data.parquet"
FORMAT PARQUET;
```

## Example 5b - Export Vector Data in Hexadecimal Format

To export the `vectors` table with vectors encoded in hexadecimal on S3, use the command below. Again, you will need to replace the S3 URI, configuration, and credentials with your own values.&#x20;

```sql
SELECT id, HEX(vec) AS vec
FROM vectors
INTO S3 's3://vec_data_folder/vec_f32_data_hex.csv'
CONFIG '{"region":"us-west-2"}'
CREDENTIALS '{
     "aws_access_key_id":"your_access_key_id",
     "aws_secret_access_key":"your_secret_access_key",
     "aws_session_token":"your_session_token"}';
```

You can output the data with vectors encoded in hexadecimal to a local CSV file using the following command.

```sql
SELECT id, HEX(vec) AS vec
FROM vectors
INTO outfile "/tmp/vec_f32_data_hex.csv";

```

## Example 6 - Generate Vector Data

Vector data in Parquet file format can be generated using Python and [pandas](https://pandas.pydata.org/). The `pandas` library provides a function for exporting a `pandas` `DataFrame` to a Parquet file.

To encode a vector in hexadecimal format, convert the little-endian binary representation of each floating-point element of the vector to its 8-character hexadecimal encoding and concatenate the results. For example, the following Python code produces the hexadecimal encoding of a vector represented as a `numpy` array:

```python
v = numpy.array([0.45, 0.55, 0.495, 0.5], dtype=numpy.float32)
v.tobytes(order='C').hex()

'6666e63ecdcc0c3fa470fd3e0000003f' 
```

To experiment further with loading hexadecimal data, use this code to generate the hexadecimal data and place the data in cloud storage location such as S3. The pipelines above can be run with data you generate data by updating the S3 location and adding your config and credentials.&#x20;

> **❗ Important**: String functions, such as [CONCAT](https://docs.singlestore.com/db/v9.1/reference/sql-reference/string-functions/concat.md), [LENGTH](https://docs.singlestore.com/db/v9.1/reference/sql-reference/string-functions/length.md), [SUBSTRING](https://docs.singlestore.com/db/v9.1/reference/sql-reference/string-functions/substring.md), and [HEX](https://docs.singlestore.com/db/v9.1/reference/sql-reference/string-functions/hex.md) operate on the binary representation of a vector. To use a string function on the JSON string representation of a vector, write code as follows, where `vec` is the name of a column of type `VECTOR`.```sql
> STRINGFUNC(vec :> JSON)
> ```

## Related Topics

[Vector Type](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-types/vector-type.md)

[Working with Vector Data](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-vector-data.md)

[LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md)

[CREATE PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/create-pipeline.md)

***

Modified at: November 6, 2025

Source: [/db/v9.1/developer-resources/functional-extensions/how-to-bulk-load-vectors/](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/how-to-bulk-load-vectors/)

(An index of the documentation is available at /llms.txt)
