Glossary

Aggregate

The task of collecting a set of values to return a single value. When data is aggregated, data rows are replaced with totals or summary statistics.

Aggregator node

A node that routes queries to the leaf nodes, aggregates intermediate results, and sends the results back to a client. There are two types of aggregators: master and child.

Approximate Nearest Neighbor

A technique used in computational geometry and machine learning to find the approximate nearest neighbors in high-dimensional spaces.

Background merger

An algorithm used by SingleStore that allows columnstore segments to maintain sort order (as close as possible), while data is being ingested or updated.

Binary Javascript Object Notation

A binary-encoded serialization of JSON-like documents. It is designed to be efficient in space but also rich in its ability to represent more data types than JSON.

Centroids

A point used in clustering algorithms to represent the center or mean of a cluster, a group of vectors near each other.

Child aggregator

A node that can be promoted to the role of Master Aggregator in the event that the existing Master Aggregator’s host fails. Depending on the query volume, a cluster may contain zero or more child aggregators.

Cluster

A collection of SingleStore aggregator and leaf nodes.

Code generation

An industrial compiler to produce highly efficient machine code that enables low-level optimizations, which are not possible when executing queries via interpretation alone. By default, queries are interpreted first and then asynchronously compiled in the background for use in later executions. This speeds up query execution time for long and complex queries, while at the same time providing efficient query plans for later use.

Common Table Expression

A named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement, possibly multiple times.

Database branching

A feature that creates private, independent copies of a database including all of its data.

Database user

A user that resides in the cluster. A database user’s lifetime is bound to that cluster such that, when a cluster is terminated, all of the users, permissions, and groups are permanently removed as well. Database users can connect to a cluster via SQL client and run SQL queries against their data. Unlike organization users, database users must be managed via SQL statements.

Deadlock

A situation when two or more transactions mutually hold and request a table write that the other transaction needs.

Deterministic

An operation or function that will always produce the same result for the same input values.

Full backup

A stored and complete copy of database.

Garbage Collection

The process where unneeded versioned nodes for multi-version concurrency control (MVCC) are eliminated. These versioned nodes may be part of skip list indexes or hash indexes for in-memory rowstore tables or columnstore segments stored in memory.

Globbing

Globbing is commonly used in data ingest to read or select a subset of files based on a naming pattern. Data ingest uses globbing to process large numbers of files quickly and efficiently, especially when the file names follow a predictable naming pattern.

Hadoop Distributed File System

A distributed file system by Apache Hadoop. It is highly fault tolerant and designed to run on COTS (Commercial Off the Shelf) or low cost out-of-the-box hardware.

Hash index

A data structure optimized for fast equality lookups by a key.

Hierarchical data

A set of data items that are related to each other by hierarchical relationships. Hierarchical relationships exist where one item of data is the parent of another item(s).

Hierarchical Navigable Small World

An algorithm used for approximate nearest neighbor (ANN) search, particularly in high-dimensional spaces. This algorithm is well-suited for applications requiring fast approximate nearest neighbor queries, such as similarity search in large-scale datasets. HNSW is known for its efficiency in handling high-dimensional data, making it a good choice for machine learning, data mining, information retrieval, etc.

High-dimensional spaces

Datasets with a large number of features or attributes, where each feature represents a separate dimension.

Host

A hardware or virtual machine which holds the aggregator and leaf nodes that comprise a SingleStore cluster.

Incremental backup

A stored copy of data that only the data stores what has been modified since the most recent backup.

Information schema

Holds the information or metadata for all the databases in a cluster.

Inline view

A SELECT statement embedded in the FROM or WITH clause of another SELECT statement that creates a temporary table that is operated on by the outer query.

In-place change

An in-place change in a database refers to modifying the data or schema of the database without requiring a full data migration or recreation of the database. It allows you to make changes to the database while minimizing downtime and preserving existing data.

Inverted file with product quantization

A method used for approximate nearest neighbor search in large-scale datasets, particularly in high-dimensional spaces.

Javascript Object Notation

An open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays.

JSON Web Key Sets

A set of keys containing public keys that can be used to authenticate any JWT.

JSON Web Token

An open, industry standard typically used for authorization and information exchange.

Leaf node

A node that stores a subset of a cluster’s data. A leaf node functions as a storage and compute node. To optimize performance, SingleStore automatically distributes data across leaf nodes into partitions. Each leaf node contains several partitions.

Lock-free backups

Backup operations that do not block INSERT, UPDATE, and DELETE operations during the backup.

Low Level Virtual Machine

It is a collection of modular and reusable compiler and toolchain technologies used for developing compiler frontends and backends. SingleStore includes an LLVM-based code generation framework that is used to compile queries to machine code.

The project has outgrown the original name, and now LLVM is just its name, not an acronym.

Master aggregator

A specialized node that’s responsible for cluster monitoring and failover. It orchestrates basic cluster operations and all DDL operations.

Metadata

Information about a database's schema, access to the database, storage, built-in programs, and or other information about the data such as date created, file size, and etc.

Multi-Version Concurrency Control

A method used to increase transaction concurrency and reduce response time for read-only transactions by maintaining a history of versions of each row in a table.

Node

A SingleStore server. A host may contain one or more SingleStore server instances.

Non-Uniform Memory Access

A computer memory design used in multiprocessing. Through NUMA, a processor can access memory that’s considered “local memory” faster than it can access memory that is local to another processor, or memory that is shared between processors.

Normalized

Organizing data to appear similar across all records and fields.

Object store

A data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks.

Online Analytical Processing

A data processing type that is designed to analyze data dimensions concurrently.

Online Transaction Processing

A data processing type that executes a number of transactions occurring concurrently.

Organization

Allows shared access to clusters and related resources within a company or group.

Organization user

An organization user resides within the “control plane” of SingleStore and can perform those actions that are available on the Cloud Portal, such as manage organization users, manage clusters, and run SQL queries against cluster data via the Cloud Portal SQL Editor.

Partition

A partition contains a subset (a shard) of a database’s data. Each partition holds a vertical slice of data, distributed as per a hashing algorithm on the primary key or randomly for keyless sharded databases.

Persisting the name of a file

Persisting the name of a file is storing the file name so it can be retrieved and used again later in a program or system.

Pipelines

A feature that continuously loads data as it arrives from external sources. As a built-in component of the database, Pipelines can extract, shape (modify), and load external data without the need for third-party tools or middleware.

Point-in-time recovery

A user-initiated operation that allows a set of data in a database to be recovered to a specific timestamp in the past.

Procedural SQL

A set of programming extensions for SingleStore that allow developers to write code in a procedural format.

Product Quantization

A technique used for vector compression. It is very effective in compressing high-dimensional vectors for nearest neighbor search.

An application that allows you to design, manage, and monitor your SingleStore instances.

user

A user that can log into the Cloud Portal and access portal services. Each Cloud Portal user is associated with a default organization and automatically has access to all of the clusters and related resources within it.

Query shape

Patterns or structures in query. Some query shapes are unsupported in SingleStore.

Random-Access Memory

A computer's short-term memory. It is where the data that the processor is currently using is stored temporarily. RAM can be accessed much faster than data on a hard disk, solid-state disk, or another long-term storage device, which is why RAM capacity is so important for system performance.

Replication

Ensures redundancy in a cluster. There are two types of replication: high availability - replicating partitions between the leaf nodes and cluster replication - replicating partitions between clusters.

Segment elimination

A process where metadata stored for columnstore segments is used to determine where a segment can match a filter queried at execution time.

Serializable isolation

An isolation level provides the strictest transaction isolation. This level guarantees transactions that are executed in parallel produce the same result as if they were executed serially (one at a time).

Shard

A subset of a databases's data.

Smart Disater Recovery

A process that handles the continuous asynchronous replication of data between a primary and a secondary region. A primary region is the main geographic location where your database(s) currently reside and operate, while a secondary region is an additional, geographically separate location to which your database(s) are replicated for disaster recovery purposes.

Solid-State Drive

A solid-state drive is a storage device. It is a non-volatile medium that stores persistent data on solid-state flash memory.

Single sign-on

Single sign-on, where one can log into SingleStore via Azure AD, Okta, and PingOne.

Skiplist index

A data structure optimized for ordered data that allows for queries to quickly seek data by binary searching.

SQL surface area

The amount of components installed and or configuration options that are enabled.

Sharding

A type of database partitioning that divides a database into smaller more easily manageable parts.

Unlimited storage

An unlimited amount of storage space in the cloud that can be used to move data to seamlessly between memory, persistent cache, and storage.

A database whose size is not limited by the size of the persistent cache, but only by available external object storage.

Unlimited storage database