# Connect with Haystack

[Haystack](https://docs.haystack.deepset.ai/) by [Deepset](https://www.deepset.ai/) is an open-source framework for building search and retrieval-augmented generation (RAG) applications. The `singlestore-haystack` library enables you to integrate your SingleStore database as a [Document Store](https://docs.haystack.deepset.ai/docs/document-store) in Haystack to store and index documents and their metadata. Haystack retrieves these documents during queries and provides them to the Retriever for additional processing.

The `singlestore-haystack` library uses the [SingleStore Python client](https://docs.singlestore.com/db/v9.1/developer-resources/connect-with-application-development-tools/connect-with-python/connect-using-the-singlestore-python-client.md) to interact with the SingleStore database. Refer to the [singlestore-haystack](https://github.com/singlestore-labs/singlestore-haystack) GitHub repository for its source code and related information.

## Install singlestore-haystack

The `singlestore-haystack` library can be installed using the standard Python package installation process:

```shell
pip install singlestore-haystack
```

## Data Storage Model

`SingleStoreDocumentStore` stores documents as rows in a SingleStore table. Vector embeddings are stored in a [VECTOR](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-types/vector-type.md) type column in the table.

`SingleStoreDocumentStore` automatically creates the required vector and full-text indexes if they do not already exist. When using `SingleStoreEmbeddingRetriever`, documents must be embedded before they are written to the database. Use a Haystack embedder to generate these embeddings. For example, use the `SentenceTransformersDocumentEmbedder` in an indexing pipeline to generate document embeddings before storing them in SingleStore.

The following is a visual representation:

![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/blt14a3939312b8a042/6a3e1348d052f5c7e58576bf/haystack-2f9Zir.png)

In this infographic:

* **Haystack table** is a SingleStore table used by `SingleStoreDocumentStore` to persist Haystack Document objects as rows.
* **embedding** is a property of the document, which is stored as a vector of type `VECTOR(n, F32)`.
* **content** is a property of the document.
* **vector indexes** are SingleStore vector indexes created on the embedding column to enable efficient search for dense retrieval.
* **fulltext index** is a SingleStore full-text index created on the content column to support BM25-based sparse retrieval.
* `write_documents` represents the insert operation where `SingleStoreDocumentStore` stores documents in the table.
* `retrieve_documents` represents the retrieval operations run by retrievers, such as `SingleStoreEmbeddingRetriever` (for vector search) and `SingleStoreBM25Retriever` (for full-text search).

For example, consider the following code:

```python
from haystack import Document
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

# Initialize the document store (uses S2_CONN_STR by default)
document_store = SingleStoreDocumentStore(
    database_name="haystack_db",
    table_name="haystack_documents",
    embedding_dimension=384,
)

# Each Document becomes a row in the SingleStore table
documents = [
    Document(
        content="SingleStore is a distributed SQL database built to power intelligent applications.",
        embedding=[0.1] * 384,  # VECTOR(384, F32) column
        meta={
            "num_of_years": 3,   # stored as JSON/metadata column
        },
    )
]

# Insert documents into SingleStore
document_store.write_documents(documents)
```

## Supported Components

This library implements the [DocumentStore protocol](https://docs.haystack.deepset.ai/docs/document-store#documentstore-protocol) methods; import the `SingleStoreDocumentStore` implementation as follows:

```python
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore
```

In addition to `SingleStoreDocumentStore`, the `singlestore-haystack` library includes the following Haystack [Retriever](https://docs.haystack.deepset.ai/docs/retrievers) components that can be used in a pipeline:

* `SingleStoreEmbeddingRetriever`: Queries SingleStore vector index and finds semantically related documents. This component uses `SingleStoreDocumentStore` to perform vector similarity search over stored vector embeddings.
  ```python
  from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreEmbeddingRetriever
  ```
* `SingleStoreBM25Retriever`: Performs sparse retrieval using the BM25 ranking algorithm. It leverages SingleStore full-text search (FTS) capabilities to retrieve documents based on keyword relevance instead of vector similarity (embeddings). This component uses `SingleStoreDocumentStore` to execute BM25 queries. SingleStore recommends using this component for keyword-based and hybrid search scenarios.
  ```python
  from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreBM25Retriever
  ```
  You can specify either of the following scoring functions:

  * BM25
  * BM25\_GLOBAL

  Refer to [BM25](https://docs.singlestore.com/db/v9.1/reference/sql-reference/full-text-search-functions/bm-25.md) for more information. For example:
  ```python
  retriever = SingleStoreBM25Retriever(document_store=document_store)
  results = retriever.run(
      query="database",
      top_k=2,
      bm25_function="BM25",
  )["documents"]
  ```

## Use SingleStore as a Document Store

## Prerequisites

Ensure the following are met before running examples in this section:

* An active SingleStore cluster.
* Install the `singlestore_haystack` package.
* (Optional) Install the `sentence-transformers` Python library. It provides pre-trained models used in this example to generate vector embeddings.
  ```python
  pip install sentence-transformers
  ```

## Configure the Connection to SingleStore

To keep the credentials out of the source code, assign the connection string to the `S2_CONN_STR` environment variable in the following format:

```shell
export S2_CONN_STR="singlestoredb://<username>:<password>@<hostname>:<port>/[<database>]"
```

where,

* `hostname`: IP address or hostname of the SingleStore cluster.
* `port`: Port of the SingleStore cluster. The default is `3306`.
* `username`: Username of the SingleStore database user.
* `password`: Password for the SingleStore database user.
* `database`: (Optional) Name of the SingleStore database to connect with.

Alternatively, specify the connection configuration while instantiating the class:

```python
document_store = SingleStoreDocumentStore(
    host="<hostname>",
    port=<port>,
    username="<username>",
    password="<password>",
    database_name="<database>",
    table_name="<table>"  # Name of SingleStore the table used to store Documents
)
```

## Configure Indexes

`SingleStoreDocumentStore` supports creating and customizing indexes on the SingleStore table. Refer to [Working with Vector Data](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-vector-data.md) for more information. Based on the retrieval strategy, enable or disable specific index types and configure the index accordingly. For example:

```python
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

document_store = SingleStoreDocumentStore(
    database_name="haystack_db",
    table_name="haystack_documents",
    embedding_dimension=768,

    # Enable FULLTEXT index for keyword/BM25 search
    use_fulltext_index=True,
    fulltext_index_options={
        "analyzer": "standard",
    },

    # Enable vector index optimized for dot product similarity
    use_dot_product_vector_index=True,
    dot_product_vector_index_options={
        "nlist": 128,
    },

    # Optionally disable Euclidean-distance index if not needed
    use_euclidian_distance_vector_index=False,
)
```

Specify the following options as applicable when instantiating a `SingleStoreDocumentStore` object:

## Dot Product Optimized Vector Index

| Option                             | Description                                                                                                                                                                                                                                                                                                                                                              |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `use_dot_product_vector_index`     | Creates a vector index using dot product similarity.                                                                                                                                                                                                                                                                                                                     |
| `dot_product_vector_index_options` | Specifies a dictionary that contains options for configuring the vector index that uses dot product similarity. These options are forwarded toSingleStore. Refer to[Vector Index Options](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/vector-indexing/#section-idm457710817120003408966071884.md)for information on supported options. |

## Euclidean Distance Optimized Vector Index

| Option                                    | Description                                                                                                                                                                                                                                                                                                                                                                                |
| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `use_euclidian_distance_vector_index`     | Creates a vector index using Euclidean (L2) distance similarity.                                                                                                                                                                                                                                                                                                                           |
| `euclidian_distance_vector_index_options` | Specifies a dictionary that contains additional options for configuring the vector index that uses Euclidean distance similarity. These options are forwarded toSingleStore. Refer to[Vector Index Options](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/vector-indexing/#section-idm457710817120003408966071884.md)for information on supported options. |

## Full Text Index

> **📝 Note**: The full-text index is required for keyword-based retrieval using the `SingleStoreBM25Retriever`.

| Option                   | Description                                                                                                                                                                                                                                                                                                                           |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `use_fulltext_index`     | Creates a full-text index (version 2).                                                                                                                                                                                                                                                                                                |
| `fulltext_index_options` | Specifies a dictionary that contains additional options for configuring the full-text index. These options are forwarded toSingleStore. Refer to[Working with Full-Text Search](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-full-text-search.md)for information on supported options. |

## Hybrid Retrieval

To support hybrid retrieval scenarios, both vector and full-text indexes can be enabled at the same time (used together). For example, to combine dense (semantic) and sparse (keyword-based) search techniques within the same Haystack pipeline.

## Write Documents

To write documents to SingleStore, use either of the following:

* `SingleStoreDocumentStore.write_documents()` method
* [DocumentWriter](https://docs.haystack.deepset.ai/docs/documentwriter) component

## write\_documents() Example

The following example generates the embeddings using `SentenceTransformersDocumentEmbedder` and then writes the document to SingleStore.

```python
from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

# Initialize the document store
document_store = SingleStoreDocumentStore(
    database_name="haystack_db",        # SingleStore database
    table_name="haystack_documents",    # SingleStore table for Documents
    embedding_dimension=384,            # Dimension of embeddings
)

# Create documents
documents = [
    Document(content="SingleStore is a distributed SQL database built to power intelligent applications.")
]

# Create the document embedder
document_embedder = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2"
)

# Download the model and prepare it (first run only)
document_embedder.warm_up()

# Generate embeddings
result = document_embedder.run(documents)
documents_with_embeddings = result["documents"]

# Write documents (with embeddings) to SingleStore
document_store.write_documents(documents_with_embeddings)
```

## DocumentWriter Example

The following example creates a Haystack pipeline to write documents to SingleStore:

```python
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

# Input documents
documents = [
    Document(content="SingleStore is a distributed SQL database built to power intelligent applications."),
    Document(content="SingleStore is delivered as a SaaS data platform (SingleStore Helios) and is available in AWS, Azure, and GCP."),
]

# Initialize the document store
document_store = SingleStoreDocumentStore(
    table_name="haystack_documents",
    embedding_dimension=384,
    recreate_table=True,   # Recreate the table if it already exists
)

# Components
embedder = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2"
)
writer = DocumentWriter(document_store=document_store)

# Build the pipeline
pipeline = Pipeline()
pipeline.add_component(instance=embedder, name="embedder")
pipeline.add_component(instance=writer, name="writer")
pipeline.connect("embedder", "writer")

# Run the indexing pipeline
result = pipeline.run({"embedder": {"documents": documents}})
print(result)  # {'writer': {'documents_written': 2}}

```

```output

`{'writer': {'documents_written': 2}}`
```

## Retrieve Documents

Use the `SingleStoreEmbeddingRetriever` component to retrieve documents from SingleStore.

For example, consider the following Haystack pipeline that finds documents using vector index and [metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering):

```python
from typing import List

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder

from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreEmbeddingRetriever
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

# Initialize the document store
document_store = SingleStoreDocumentStore(
    database_name="haystack_db",  # The name of the database in SingleStore
    table_name="haystack_documents",  # The name of the table to store Documents
    embedding_dimension=384,  # The dimension of the embeddings being stored
    recreate_table=True,
)

# Sample documents with metadata
documents = [
    Document(content="My name is Morgan and I live in Paris.", meta={"num_of_years": 3}),
    Document(content="I am Susan and I live in Berlin.", meta={"num_of_years": 7}),
]

# The same model is used for both query and Document embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"

# Embed and write documents
document_embedder = SentenceTransformersDocumentEmbedder(model=model_name)
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(documents_with_embeddings.get("documents"))

print("Number of documents written: ", document_store.count_documents())

# Build the retrieval pipeline
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model=model_name))
pipeline.add_component("retriever", SingleStoreEmbeddingRetriever(document_store=document_store))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

# Run a query with metadata filtering
result = pipeline.run(
    data={
        "text_embedder": {"text": "What cities do people live in?"},
        "retriever": {
            "top_k": 5,
            "filters": {"field": "meta.num_of_years", "operator": "==", "value": 3},
        },
    }
)

documents: List[Document] = result["retriever"]["documents"]
print(documents)

```

```output

[Document(id=4014455c3be5d88151ba12d734a16754d7af75c691dfc3a5f364f81772471bd2, content: 'My name is Morgan and I live in Paris.', meta: {'num_of_years': 3}, score: 0.339349627494812, embedding: vector of size 384)]
```

## Examples

Refer to the [singlestore-haystack](https://github.com/singlestore-labs/singlestore-haystack?tab=readme-ov-file#more-examples) GitHub repository for more examples.

## References

* [Vector Indexing](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/vector-indexing.md)
* [Vector Type](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-types/vector-type.md)
* [Working with Vector Data](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-vector-data.md)
* [Working with Full-Text Search](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-full-text-search.md)
* [DOT\_PRODUCT](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/dot-product.md)
* [EUCLIDEAN\_DISTANCE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/euclidean-distance.md)

***

Modified at: April 24, 2026

Source: [/db/v9.1/developer-resources/connect-with-application-development-tools/connect-with-haystack/](https://docs.singlestore.com/db/v9.1/developer-resources/connect-with-application-development-tools/connect-with-haystack/)

(An index of the documentation is available at /llms.txt)
