Connect with Haystack

Haystack by Deepset is an open-source framework for building search and retrieval-augmented generation (RAG) applications. The singlestore-haystack library enables you to integrate your SingleStore database as a Document Store in Haystack to store and index documents and their metadata. Haystack retrieves these documents during queries and provides them to the Retriever for additional processing.

The singlestore-haystack library uses the SingleStore Python client to interact with the SingleStore database. Refer to the singlestore-haystack GitHub repository for its source code and related information.

Install singlestore-haystack

The singlestore-haystack library can be installed using the standard Python package installation process:

pip install singlestore-haystack

Data Storage Model

SingleStoreDocumentStore stores documents as rows in a SingleStore table. Vector embeddings are stored in a VECTOR type column in the table.

SingleStoreDocumentStore automatically creates the required vector and full-text indexes if they do not already exist. When using SingleStoreEmbeddingRetriever, documents must be embedded before they are written to the database. Use a Haystack embedder to generate these embeddings. For example, use the SentenceTransformersDocumentEmbedder in an indexing pipeline to generate document embeddings before storing them in SingleStore.

The following is a visual representation:

In this infographic:

  • Haystack table is a SingleStore table used by SingleStoreDocumentStore to persist Haystack Document objects as rows.

  • embedding is a property of the document, which is stored as a vector of type VECTOR(n, F32).

  • content is a property of the document.

  • vector indexes are SingleStore vector indexes created on the embedding column to enable efficient search for dense retrieval.

  • fulltext index is a SingleStore full-text index created on the content column to support BM25-based sparse retrieval.

  • write_documents represents the insert operation where SingleStoreDocumentStore stores documents in the table.

  • retrieve_documents represents the retrieval operations run by retrievers, such as SingleStoreEmbeddingRetriever (for vector search) and SingleStoreBM25Retriever (for full-text search).

For example, consider the following code:

from haystack import Document
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore
# Initialize the document store (uses S2_CONN_STR by default)
document_store = SingleStoreDocumentStore(
database_name="haystack_db",
table_name="haystack_documents",
embedding_dimension=384,
)
# Each Document becomes a row in the SingleStore table
documents = [
Document(
content="SingleStore is a distributed SQL database built to power intelligent applications.",
embedding=[0.1] * 384, # VECTOR(384, F32) column
meta={
"num_of_years": 3, # stored as JSON/metadata column
},
)
]
# Insert documents into SingleStore
document_store.write_documents(documents)

Supported Components

This library implements the DocumentStore protocol methods; import the SingleStoreDocumentStore implementation as follows:

from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore

In addition to SingleStoreDocumentStore, the singlestore-haystack library includes the following Haystack Retriever components that can be used in a pipeline:

  • SingleStoreEmbeddingRetriever: Queries SingleStore vector index and finds semantically related documents. This component uses SingleStoreDocumentStore to perform vector similarity search over stored vector embeddings.

    from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreEmbeddingRetriever
  • SingleStoreBM25Retriever: Performs sparse retrieval using the BM25 ranking algorithm. It leverages SingleStore full-text search (FTS) capabilities to retrieve documents based on keyword relevance instead of vector similarity (embeddings). This component uses SingleStoreDocumentStore to execute BM25 queries. SingleStore recommends using this component for keyword-based and hybrid search scenarios.

    from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreBM25Retriever

    You can specify either of the following scoring functions:

    • BM25

    • BM25_GLOBAL

    Refer to BM25 for more information. For example:

    retriever = SingleStoreBM25Retriever(document_store=document_store)
    results = retriever.run(
    query="database",
    top_k=2,
    bm25_function="BM25",
    )["documents"]

Use SingleStore as a Document Store

Prerequisites

Ensure the following are met before running examples in this section:

  • An active SingleStore workspace.

  • Install the singlestore_haystack package.

  • (Optional) Install the sentence-transformers Python library. It provides pre-trained models used in this example to generate vector embeddings.

    pip install sentence-transformers

Configure the Connection to SingleStore

To keep the credentials out of the source code, assign the connection string to the S2_CONN_STR environment variable in the following format:

export S2_CONN_STR="singlestoredb://<username>:<password>@<hostname>:<port>/[<database>]"

where,

  • hostname: IP address or hostname of the SingleStore workspace.

  • port: Port of the SingleStore workspace. The default is 3306.

  • username: Username of the SingleStore database user.

  • password: Password for the SingleStore database user.

  • database: (Optional) Name of the SingleStore database to connect with.

Alternatively, specify the connection configuration while instantiating the class:

document_store = SingleStoreDocumentStore(
host="<hostname>",
port=<port>,
username="<username>",
password="<password>",
database_name="<database>",
table_name="<table>" # Name of SingleStore the table used to store Documents
)

Configure Indexes

SingleStoreDocumentStore supports creating and customizing indexes on the SingleStore table. Refer to Working with Vector Data for more information. Based on the retrieval strategy, enable or disable specific index types and configure the index accordingly. For example:

from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore
document_store = SingleStoreDocumentStore(
database_name="haystack_db",
table_name="haystack_documents",
embedding_dimension=768,
# Enable FULLTEXT index for keyword/BM25 search
use_fulltext_index=True,
fulltext_index_options={
"analyzer": "standard",
},
# Enable vector index optimized for dot product similarity
use_dot_product_vector_index=True,
dot_product_vector_index_options={
"nlist": 128,
},
# Optionally disable Euclidean-distance index if not needed
use_euclidian_distance_vector_index=False,
)

Specify the following options as applicable when instantiating a SingleStoreDocumentStore object:

Dot Product Optimized Vector Index

Option

Description

use_dot_product_vector_index

Creates a vector index using dot product similarity.

dot_product_vector_index_options

Specifies a dictionary that contains options for configuring the vector index that uses dot product similarity. These options are forwarded to SingleStore. Refer to Vector Index Options for information on supported options.

Euclidean Distance Optimized Vector Index

Option

Description

use_euclidian_distance_vector_index

Creates a vector index using Euclidean (L2) distance similarity.

euclidian_distance_vector_index_options

Specifies a dictionary that contains additional options for configuring the vector index that uses Euclidean distance similarity. These options are forwarded to SingleStore. Refer to Vector Index Options for information on supported options.

Full Text Index

Note

The full-text index is required for keyword-based retrieval using the SingleStoreBM25Retriever.

Option

Description

use_fulltext_index

Creates a full-text index (version 2).

fulltext_index_options

Specifies a dictionary that contains additional options for configuring the full-text index. These options are forwarded to SingleStore. Refer to Working with Full-Text Search for information on supported options.

Hybrid Retrieval

To support hybrid retrieval scenarios, both vector and full-text indexes can be enabled at the same time (used together). For example, to combine dense (semantic) and sparse (keyword-based) search techniques within the same Haystack pipeline.

Write Documents

To write documents to SingleStore, use either of the following:

  • SingleStoreDocumentStore.write_documents() method

  • DocumentWriter component

write_documents() Example

The following example generates the embeddings using SentenceTransformersDocumentEmbedder and then writes the document to SingleStore.

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore
# Initialize the document store
document_store = SingleStoreDocumentStore(
database_name="haystack_db", # SingleStore database
table_name="haystack_documents", # SingleStore table for Documents
embedding_dimension=384, # Dimension of embeddings
)
# Create documents
documents = [
Document(content="SingleStore is a distributed SQL database built to power intelligent applications.")
]
# Create the document embedder
document_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2"
)
# Download the model and prepare it (first run only)
document_embedder.warm_up()
# Generate embeddings
result = document_embedder.run(documents)
documents_with_embeddings = result["documents"]
# Write documents (with embeddings) to SingleStore
document_store.write_documents(documents_with_embeddings)

DocumentWriter Example

The following example creates a Haystack pipeline to write documents to SingleStore:

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore
# Input documents
documents = [
Document(content="SingleStore is a distributed SQL database built to power intelligent applications."),
Document(content="SingleStore is delivered as a SaaS data platform (SingleStore Helios) and is available in AWS, Azure, and GCP."),
]
# Initialize the document store
document_store = SingleStoreDocumentStore(
table_name="haystack_documents",
embedding_dimension=384,
recreate_table=True, # Recreate the table if it already exists
)
# Components
embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2"
)
writer = DocumentWriter(document_store=document_store)
# Build the pipeline
pipeline = Pipeline()
pipeline.add_component(instance=embedder, name="embedder")
pipeline.add_component(instance=writer, name="writer")
pipeline.connect("embedder", "writer")
# Run the indexing pipeline
result = pipeline.run({"embedder": {"documents": documents}})
print(result) # {'writer': {'documents_written': 2}}
`{'writer': {'documents_written': 2}}`

Retrieve Documents

Use the SingleStoreEmbeddingRetriever component to retrieve documents from SingleStore.

For example, consider the following Haystack pipeline that finds documents using vector index and metadata filtering:

from typing import List
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreEmbeddingRetriever
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore
# Initialize the document store
document_store = SingleStoreDocumentStore(
database_name="haystack_db", # The name of the database in SingleStore
table_name="haystack_documents", # The name of the table to store Documents
embedding_dimension=384, # The dimension of the embeddings being stored
recreate_table=True,
)
# Sample documents with metadata
documents = [
Document(content="My name is Morgan and I live in Paris.", meta={"num_of_years": 3}),
Document(content="I am Susan and I live in Berlin.", meta={"num_of_years": 7}),
]
# The same model is used for both query and Document embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"
# Embed and write documents
document_embedder = SentenceTransformersDocumentEmbedder(model=model_name)
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(documents_with_embeddings.get("documents"))
print("Number of documents written: ", document_store.count_documents())
# Build the retrieval pipeline
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model=model_name))
pipeline.add_component("retriever", SingleStoreEmbeddingRetriever(document_store=document_store))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
# Run a query with metadata filtering
result = pipeline.run(
data={
"text_embedder": {"text": "What cities do people live in?"},
"retriever": {
"top_k": 5,
"filters": {"field": "meta.num_of_years", "operator": "==", "value": 3},
},
}
)
documents: List[Document] = result["retriever"]["documents"]
print(documents)
[Document(id=4014455c3be5d88151ba12d734a16754d7af75c691dfc3a5f364f81772471bd2, content: 'My name is Morgan and I live in Paris.', meta: {'num_of_years': 3}, score: 0.339349627494812, embedding: vector of size 384)]

Examples

Refer to the singlestore-haystack GitHub repository for more examples.

References

Last modified:

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK

Try Out This Notebook to See What’s Possible in SingleStore

Get access to other groundbreaking datasets and engage with our community for expert advice.