Connect with Haystack
On this page
Haystack by Deepset is an open-source framework for building search and retrieval-augmented generation (RAG) applications.singlestore-haystack library enables you to integrate your SingleStore database as a Document Store in Haystack to store and index documents and their metadata.
The singlestore-haystack library uses the SingleStore Python client to interact with the SingleStore database.
Install singlestore-haystack
The singlestore-haystack library can be installed using the standard Python package installation process:
pip install singlestore-haystack
Data Storage Model
SingleStoreDocumentStore stores documents as rows in a SingleStore table.
SingleStoreDocumentStore automatically creates the required vector and full-text indexes if they do not already exist.SingleStoreEmbeddingRetriever, documents must be embedded before they are written to the database.SentenceTransformersDocumentEmbedder in an indexing pipeline to generate document embeddings before storing them in SingleStore.
The following is a visual representation:
In this infographic:
-
Haystack table is a SingleStore table used by
SingleStoreDocumentStoreto persist Haystack Document objects as rows. -
embedding is a property of the document, which is stored as a vector of type
VECTOR(n, F32). -
content is a property of the document.
-
vector indexes are SingleStore vector indexes created on the embedding column to enable efficient search for dense retrieval.
-
fulltext index is a SingleStore full-text index created on the content column to support BM25-based sparse retrieval.
-
write_represents the insert operation wheredocuments SingleStoreDocumentStorestores documents in the table. -
retrieve_represents the retrieval operations run by retrievers, such asdocuments SingleStoreEmbeddingRetriever(for vector search) andSingleStoreBM25Retriever(for full-text search).
For example, consider the following code:
from haystack import Documentfrom haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore# Initialize the document store (uses S2_CONN_STR by default)document_store = SingleStoreDocumentStore(database_name="haystack_db",table_name="haystack_documents",embedding_dimension=384,)# Each Document becomes a row in the SingleStore tabledocuments = [Document(content="SingleStore is a distributed SQL database built to power intelligent applications.",embedding=[0.1] * 384, # VECTOR(384, F32) columnmeta={"num_of_years": 3, # stored as JSON/metadata column},)]# Insert documents into SingleStoredocument_store.write_documents(documents)
Supported Components
This library implements the DocumentStore protocol methods; import the SingleStoreDocumentStore implementation as follows:
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore
In addition to SingleStoreDocumentStore, the singlestore-haystack library includes the following Haystack Retriever components that can be used in a pipeline:
-
SingleStoreEmbeddingRetriever: Queries SingleStore vector index and finds semantically related documents.This component uses SingleStoreDocumentStoreto perform vector similarity search over stored vector embeddings.from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreEmbeddingRetriever -
SingleStoreBM25Retriever: Performs sparse retrieval using the BM25 ranking algorithm.It leverages SingleStore full-text search (FTS) capabilities to retrieve documents based on keyword relevance instead of vector similarity (embeddings). This component uses SingleStoreDocumentStoreto execute BM25 queries.SingleStore recommends using this component for keyword-based and hybrid search scenarios. from haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreBM25RetrieverYou can specify either of the following scoring functions:
-
BM25
-
BM25_
GLOBAL
Refer to BM25 for more information.
For example: retriever = SingleStoreBM25Retriever(document_store=document_store)results = retriever.run(query="database",top_k=2,bm25_function="BM25",)["documents"] -
Use SingleStore as a Document Store
Prerequisites
Ensure the following are met before running examples in this section:
-
An active SingleStore workspace.
-
Install the
singlestore_package.haystack -
(Optional) Install the
sentence-transformersPython library.It provides pre-trained models used in this example to generate vector embeddings. pip install sentence-transformers
Configure the Connection to SingleStore
To keep the credentials out of the source code, assign the connection string to the S2_ environment variable in the following format:
export S2_CONN_STR="singlestoredb://<username>:<password>@<hostname>:<port>/[<database>]"
where,
-
hostname: IP address or hostname of the SingleStore workspace. -
port: Port of the SingleStore workspace.The default is 3306. -
username: Username of the SingleStore database user. -
password: Password for the SingleStore database user. -
database: (Optional) Name of the SingleStore database to connect with.
Alternatively, specify the connection configuration while instantiating the class:
document_store = SingleStoreDocumentStore(host="<hostname>",port=<port>,username="<username>",password="<password>",database_name="<database>",table_name="<table>" # Name of SingleStore the table used to store Documents)
Configure Indexes
SingleStoreDocumentStore supports creating and customizing indexes on the SingleStore table.
from haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStoredocument_store = SingleStoreDocumentStore(database_name="haystack_db",table_name="haystack_documents",embedding_dimension=768,# Enable FULLTEXT index for keyword/BM25 searchuse_fulltext_index=True,fulltext_index_options={"analyzer": "standard",},# Enable vector index optimized for dot product similarityuse_dot_product_vector_index=True,dot_product_vector_index_options={"nlist": 128,},# Optionally disable Euclidean-distance index if not neededuse_euclidian_distance_vector_index=False,)
Specify the following options as applicable when instantiating a SingleStoreDocumentStore object:
Dot Product Optimized Vector Index
|
Option |
Description |
|---|---|
|
|
Creates a vector index using dot product similarity. |
|
|
Specifies a dictionary that contains options for configuring the vector index that uses dot product similarity. |
Euclidean Distance Optimized Vector Index
|
Option |
Description |
|---|---|
|
|
Creates a vector index using Euclidean (L2) distance similarity. |
|
|
Specifies a dictionary that contains additional options for configuring the vector index that uses Euclidean distance similarity. |
Full Text Index
Note
The full-text index is required for keyword-based retrieval using the SingleStoreBM25Retriever.
|
Option |
Description |
|---|---|
|
|
Creates a full-text index (version 2). |
|
|
Specifies a dictionary that contains additional options for configuring the full-text index. |
Hybrid Retrieval
To support hybrid retrieval scenarios, both vector and full-text indexes can be enabled at the same time (used together).
Write Documents
To write documents to SingleStore, use either of the following:
-
SingleStoreDocumentStore.methodwrite_ documents() -
DocumentWriter component
write_ documents() Example
The following example generates the embeddings using SentenceTransformersDocumentEmbedder and then writes the document to SingleStore.
from haystack import Documentfrom haystack.components.embedders import SentenceTransformersDocumentEmbedderfrom haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore# Initialize the document storedocument_store = SingleStoreDocumentStore(database_name="haystack_db", # SingleStore databasetable_name="haystack_documents", # SingleStore table for Documentsembedding_dimension=384, # Dimension of embeddings)# Create documentsdocuments = [Document(content="SingleStore is a distributed SQL database built to power intelligent applications.")]# Create the document embedderdocument_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")# Download the model and prepare it (first run only)document_embedder.warm_up()# Generate embeddingsresult = document_embedder.run(documents)documents_with_embeddings = result["documents"]# Write documents (with embeddings) to SingleStoredocument_store.write_documents(documents_with_embeddings)
DocumentWriter Example
The following example creates a Haystack pipeline to write documents to SingleStore:
from haystack import Document, Pipelinefrom haystack.components.embedders import SentenceTransformersDocumentEmbedderfrom haystack.components.writers import DocumentWriterfrom haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore# Input documentsdocuments = [Document(content="SingleStore is a distributed SQL database built to power intelligent applications."),Document(content="SingleStore is delivered as a SaaS data platform (SingleStore Helios) and is available in AWS, Azure, and GCP."),]# Initialize the document storedocument_store = SingleStoreDocumentStore(table_name="haystack_documents",embedding_dimension=384,recreate_table=True, # Recreate the table if it already exists)# Componentsembedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")writer = DocumentWriter(document_store=document_store)# Build the pipelinepipeline = Pipeline()pipeline.add_component(instance=embedder, name="embedder")pipeline.add_component(instance=writer, name="writer")pipeline.connect("embedder", "writer")# Run the indexing pipelineresult = pipeline.run({"embedder": {"documents": documents}})print(result) # {'writer': {'documents_written': 2}}
`{'writer': {'documents_written': 2}}`Retrieve Documents
Use the SingleStoreEmbeddingRetriever component to retrieve documents from SingleStore.
For example, consider the following Haystack pipeline that finds documents using vector index and metadata filtering:
from typing import Listfrom haystack import Document, Pipelinefrom haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedderfrom haystack_integrations.components.retrievers.singlestore_haystack import SingleStoreEmbeddingRetrieverfrom haystack_integrations.document_stores.singlestore_haystack import SingleStoreDocumentStore# Initialize the document storedocument_store = SingleStoreDocumentStore(database_name="haystack_db", # The name of the database in SingleStoretable_name="haystack_documents", # The name of the table to store Documentsembedding_dimension=384, # The dimension of the embeddings being storedrecreate_table=True,)# Sample documents with metadatadocuments = [Document(content="My name is Morgan and I live in Paris.", meta={"num_of_years": 3}),Document(content="I am Susan and I live in Berlin.", meta={"num_of_years": 7}),]# The same model is used for both query and Document embeddingsmodel_name = "sentence-transformers/all-MiniLM-L6-v2"# Embed and write documentsdocument_embedder = SentenceTransformersDocumentEmbedder(model=model_name)document_embedder.warm_up()documents_with_embeddings = document_embedder.run(documents)document_store.write_documents(documents_with_embeddings.get("documents"))print("Number of documents written: ", document_store.count_documents())# Build the retrieval pipelinepipeline = Pipeline()pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model=model_name))pipeline.add_component("retriever", SingleStoreEmbeddingRetriever(document_store=document_store))pipeline.connect("text_embedder.embedding", "retriever.query_embedding")# Run a query with metadata filteringresult = pipeline.run(data={"text_embedder": {"text": "What cities do people live in?"},"retriever": {"top_k": 5,"filters": {"field": "meta.num_of_years", "operator": "==", "value": 3},},})documents: List[Document] = result["retriever"]["documents"]print(documents)
[Document(id=4014455c3be5d88151ba12d734a16754d7af75c691dfc3a5f364f81772471bd2, content: 'My name is Morgan and I live in Paris.', meta: {'num_of_years': 3}, score: 0.339349627494812, embedding: vector of size 384)]Examples
Refer to the singlestore-haystack GitHub repository for more examples.
References
Last modified: