Connect with Haystack

Haystack by Deepset is an open-source framework for building search and retrieval-augmented generation (RAG) applications. The singlestore-haystack library enables you to integrate your SingleStore database as a Document Store in Haystack to store and index documents and their metadata. Haystack retrieves these documents during queries and provides them to the Retriever for additional processing.

The singlestore-haystack library uses the SingleStore Python client to interact with the SingleStore database. Refer to the singlestore-haystack GitHub repository for its source code and related information.

Install singlestore-haystack

The singlestore-haystack library can be installed using the standard Python package installation process:

Shell

pip install singlestore-haystack

Data Storage Model

SingleStoreDocumentStore stores documents as rows in a SingleStore table. Vector embeddings are stored in a VECTOR type column in the table.

SingleStoreDocumentStore automatically creates the required vector and full-text indexes if they do not already exist. When using SingleStoreEmbeddingRetriever, documents must be embedded before they are written to the database. Use a Haystack embedder to generate these embeddings. For example, use the SentenceTransformersDocumentEmbedder in an indexing pipeline to generate document embeddings before storing them in SingleStore.

The following is a visual representation:

In this infographic:

Haystack table is a SingleStore table used by SingleStoreDocumentStore to persist Haystack Document objects as rows.
embedding is a property of the document, which is stored as a vector of type VECTOR(n, F32).
content is a property of the document.
vector indexes are SingleStore vector indexes created on the embedding column to enable efficient search for dense retrieval.
fulltext index is a SingleStore full-text index created on the content column to support BM25-based sparse retrieval.
write_documents represents the insert operation where SingleStoreDocumentStore stores documents in the table.
retrieve_documents represents the retrieval operations run by retrievers, such as SingleStoreEmbeddingRetriever (for vector search) and SingleStoreBM25Retriever (for full-text search).

For example, consider the following code:

Python

from haystack import Document
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

# Initialize the document store (uses S2_CONN_STR by default)
document_store = SingleStoreDocumentStore(
    database_name="haystack_db",
    table_name="haystack_documents",
    embedding_dimension=384,
)

# Each Document becomes a row in the SingleStore table
documents = [
    Document(
        content="SingleStore is a distributed SQL database built to power intelligent applications.",
        embedding=[0.1] * 384,  # VECTOR(384, F32) column
        meta={
            "num_of_years": 3,   # stored as JSON/metadata column
        },
    )
]

# Insert documents into SingleStore
document_store.write_documents(documents)

Supported Components

This library implements the DocumentStore protocol methods; import the SingleStoreDocumentStore implementation as follows:

Python

from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

In addition to SingleStoreDocumentStore, the singlestore-haystack library includes the following Haystack Retriever components that can be used in a pipeline:

SingleStoreEmbeddingRetriever: Queries SingleStore vector index and finds semantically related documents. This component uses SingleStoreDocumentStore to perform vector similarity search over stored vector embeddings.
Python
```
from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreEmbeddingRetriever
```
SingleStoreBM25Retriever: Performs sparse retrieval using the BM25 ranking algorithm. It leverages SingleStore full-text search (FTS) capabilities to retrieve documents based on keyword relevance instead of vector similarity (embeddings). This component uses SingleStoreDocumentStore to execute BM25 queries. SingleStore recommends using this component for keyword-based and hybrid search scenarios.
Python
```
from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreBM25Retriever
```
You can specify either of the following scoring functions:
- BM25
- BM25_GLOBAL
Refer to BM25 for more information. For example:
Python
```
retriever = SingleStoreBM25Retriever(document_store=document_store)
results = retriever.run(
    query="database",
    top_k=2,
    bm25_function="BM25",
)["documents"]
```

Use SingleStore as a Document Store

Prerequisites

Ensure the following are met before running examples in this section:

An active SingleStore workspace.
Install the singlestore_haystack package.
(Optional) Install the sentence-transformers Python library. It provides pre-trained models used in this example to generate vector embeddings.
Python
```
pip install sentence-transformers
```

Configure the Connection to SingleStore

To keep the credentials out of the source code, assign the connection string to the S2_CONN_STR environment variable in the following format:

Shell

export S2_CONN_STR="singlestoredb://<username>:<password>@<hostname>:<port>/[<database>]"

where,

hostname: IP address or hostname of the SingleStore workspace.
port: Port of the SingleStore workspace. The default is 3306.
username: Username of the SingleStore database user.
password: Password for the SingleStore database user.
database: (Optional) Name of the SingleStore database to connect with.

Alternatively, specify the connection configuration while instantiating the class:

Python

document_store = SingleStoreDocumentStore(
    host="<hostname>",
    port=<port>,
    username="<username>",
    password="<password>",
    database_name="<database>",
    table_name="<table>"  # Name of SingleStore the table used to store Documents
)

Configure Indexes

SingleStoreDocumentStore supports creating and customizing indexes on the SingleStore table. Refer to Working with Vector Data for more information. Based on the retrieval strategy, enable or disable specific index types and configure the index accordingly. For example:

Python

from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

document_store = SingleStoreDocumentStore(
    database_name="haystack_db",
    table_name="haystack_documents",
    embedding_dimension=768,

    # Enable FULLTEXT index for keyword/BM25 search
    use_fulltext_index=True,
    fulltext_index_options={
        "analyzer": "standard",
    },

    # Enable vector index optimized for dot product similarity
    use_dot_product_vector_index=True,
    dot_product_vector_index_options={
        "nlist": 128,
    },

    # Optionally disable Euclidean-distance index if not needed
    use_euclidian_distance_vector_index=False,
)

Specify the following options as applicable when instantiating a SingleStoreDocumentStore object:

Dot Product Optimized Vector Index

Option	Description
`use_dot_product_vector_index`	Creates a vector index using dot product similarity.
`dot_product_vector_index_options`	Specifies a dictionary that contains options for configuring the vector index that uses dot product similarity. These options are forwarded to SingleStore. Refer to Vector Index Options for information on supported options.

Euclidean Distance Optimized Vector Index

Option	Description
`use_euclidian_distance_vector_index`	Creates a vector index using Euclidean (L2) distance similarity.
`euclidian_distance_vector_index_options`	Specifies a dictionary that contains additional options for configuring the vector index that uses Euclidean distance similarity. These options are forwarded to SingleStore. Refer to Vector Index Options for information on supported options.

Full Text Index

Note

The full-text index is required for keyword-based retrieval using the SingleStoreBM25Retriever.

Option	Description
`use_fulltext_index`	Creates a full-text index (version 2).
`fulltext_index_options`	Specifies a dictionary that contains additional options for configuring the full-text index. These options are forwarded to SingleStore. Refer to Working with Full-Text Search for information on supported options.

Hybrid Retrieval

To support hybrid retrieval scenarios, both vector and full-text indexes can be enabled at the same time (used together). For example, to combine dense (semantic) and sparse (keyword-based) search techniques within the same Haystack pipeline.

Write Documents

To write documents to SingleStore, use either of the following:

SingleStoreDocumentStore.write_documents() method
DocumentWriter component

write_documents() Example

The following example generates the embeddings using SentenceTransformersDocumentEmbedder and then writes the document to SingleStore.

Python

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

# Initialize the document store
document_store = SingleStoreDocumentStore(
    database_name="haystack_db",        # SingleStore database
    table_name="haystack_documents",    # SingleStore table for Documents
    embedding_dimension=384,            # Dimension of embeddings
)

# Create documents
documents = [
    Document(content="SingleStore is a distributed SQL database built to power intelligent applications.")
]

# Create the document embedder
document_embedder = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2"
)

# Download the model and prepare it (first run only)
document_embedder.warm_up()

# Generate embeddings
result = document_embedder.run(documents)
documents_with_embeddings = result["documents"]

# Write documents (with embeddings) to SingleStore
document_store.write_documents(documents_with_embeddings)

DocumentWriter Example

The following example creates a Haystack pipeline to write documents to SingleStore:

Python

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

# Input documents
documents = [
    Document(content="SingleStore is a distributed SQL database built to power intelligent applications."),
    Document(content="SingleStore is delivered as a SaaS data platform (SingleStore Helios) and is available in AWS, Azure, and GCP."),
]

# Initialize the document store
document_store = SingleStoreDocumentStore(
    table_name="haystack_documents",
    embedding_dimension=384,
    recreate_table=True,   # Recreate the table if it already exists
)

# Components
embedder = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2"
)
writer = DocumentWriter(document_store=document_store)

# Build the pipeline
pipeline = Pipeline()
pipeline.add_component(instance=embedder, name="embedder")
pipeline.add_component(instance=writer, name="writer")
pipeline.connect("embedder", "writer")

# Run the indexing pipeline
result = pipeline.run({"embedder": {"documents": documents}})
print(result)  # {'writer': {'documents_written': 2}}

`{'writer': {'documents_written': 2}}`

Retrieve Documents

Use the SingleStoreEmbeddingRetriever component to retrieve documents from SingleStore.

For example, consider the following Haystack pipeline that finds documents using vector index and metadata filtering:

Python

from typing import List

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder

from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreEmbeddingRetriever
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

# Initialize the document store
document_store = SingleStoreDocumentStore(
    database_name="haystack_db",  # The name of the database in SingleStore
    table_name="haystack_documents",  # The name of the table to store Documents
    embedding_dimension=384,  # The dimension of the embeddings being stored
    recreate_table=True,
)

# Sample documents with metadata
documents = [
    Document(content="My name is Morgan and I live in Paris.", meta={"num_of_years": 3}),
    Document(content="I am Susan and I live in Berlin.", meta={"num_of_years": 7}),
]

# The same model is used for both query and Document embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"

# Embed and write documents
document_embedder = SentenceTransformersDocumentEmbedder(model=model_name)
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(documents_with_embeddings.get("documents"))

print("Number of documents written: ", document_store.count_documents())

# Build the retrieval pipeline
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model=model_name))
pipeline.add_component("retriever", SingleStoreEmbeddingRetriever(document_store=document_store))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

# Run a query with metadata filtering
result = pipeline.run(
    data={
        "text_embedder": {"text": "What cities do people live in?"},
        "retriever": {
            "top_k": 5,
            "filters": {"field": "meta.num_of_years", "operator": "==", "value": 3},
        },
    }
)

documents: List[Document] = result["retriever"]["documents"]
print(documents)

[Document(id=4014455c3be5d88151ba12d734a16754d7af75c691dfc3a5f364f81772471bd2, content: 'My name is Morgan and I live in Paris.', meta: {'num_of_years': 3}, score: 0.339349627494812, embedding: vector of size 384)]

Examples

Refer to the singlestore-haystack GitHub repository for more examples.

Connect with Haystack

On this page

Install singlestore-haystack

Data Storage Model

Supported Components

Use SingleStore as a Document Store

Prerequisites

Configure the Connection to SingleStore

Configure Indexes

Dot Product Optimized Vector Index

Euclidean Distance Optimized Vector Index

Full Text Index

Hybrid Retrieval

Write Documents

write_documents() Example

DocumentWriter Example

Retrieve Documents

Examples

References

Was this article helpful?

On this page

Was this article helpful?