SingleStore DB

Working with Geospatial Features

SingleStore has added support for geospatial queries. The term geospatial covers a very broad range of features. There is a formal standardization for some of those features, known as SQL-MM or OpenGIS. The features in SingleStore are not a complete implementation of that standard, but follow the same structure.

At a basic level, geospatial support requires three things: datatypes, topological functions, and measurement functions.

Geospatial Types
Figure 9. image
image

The three main geospatial object types are points, paths, and polygons. We support a subset of the standard Well-Known Text (WKT) syntax to describe them. For example, a POINT is a simple longitude / latitude pair. Note the lack of a comma between the numbers.

"POINT(-74.044514 40.689244)"

A linestring or path object is an ordered list of points. The points (vertices) in a linestring (or polygon) object are connected by edges.

"LINESTRING(0 0, 0 1, 1 1)"

A polygon is similar to a linestring, except that it is closed. That is, it must have at least three unique points, and the first and last point-pairs must be equal:

"POLYGON((0 0, 0 1, 1 1, 0 0))"

You can also describe multiple shapes (or rings) inside a polygon object, i.e., a polygon object can also be represented as a set of rings. For example, a set of islands, or a lake inside a country.

Each ring is represented as a set of points. The first ring (external ring) in the set represents the shape of the polygon. The subsequent rings in the set (internal rings) represent the holes in the shape. The rings within the boundary of a polygon (internal rings) can never cross. They may however intersect at a point, but only as a tangent. Each respective shape embedded within a polygon object is represented using the following notation.

"POLYGON(Ring 0, Ring 1, ... Ring N)"

Note the parenthesis around pairs of numbers, which are separated by commas.

"POLYGON((0 0, 0 10, 10 10, 10 0, 0 0), (5 5, 6 5, 5 6, 5 5))"

In the above example, rings (0 0, 0 10, 10 10, 10 0, 0 0) and (5 5, 6 5, 5 6, 5 5) represent Ring 0 and Ring 1, respectively.

All geospatial functions can accept either a geospatial object or the equivalent WKT string as arguments. SingleStore does not support the MULTIPOLYGON, GEOMETRY, CURVE, or other keywords.



Creating a Geospatial Table

Geospatial objects are just another data type in SingleStore and can be used right alongside numbers, strings, and JSON. To create a column that holds point data, use the GeographyPoint type. For path and polygon data, use the Geography type.

Important

It is possible to store POINTs in the Geography type, but is not recommended. The GeographyPoint type was designed specifically for point data, and is highly optimized for memory efficiency and speed.

In this example, we’ll create a table of neighborhoods. The centroid field is the point that is roughly in the middle of the borders of the neighborhood. The shape field is the polygon describing those borders. All geospatial data can be indexed for very high query performance.

CREATE TABLE neighborhoods (
  id INT UNSIGNED NOT NULL PRIMARY KEY,
  name VARCHAR(64) NOT NULL,
  population INT UNSIGNED NOT NULL,
  shape GEOGRAPHY NOT NULL,
  centroid GEOGRAPHYPOINT NOT NULL,

  index (shape) WITH (resolution = 8),
  index (centroid)
  );

  INSERT INTO neighborhoods VALUES
  (1, "Hamilton", 12346, "POLYGON((1 1,2 1,2 2, 1 2, 1 1))","POINT(1.5 1.5)"),
  (2, "Probability Bay", 263951, "POLYGON((5 1,6 1,6 2,5 2,5 1))", "POINT(5.5 1.5)"),
  (3, "Decentralized Park", 29265, "POLYGON((5 5,6 5,6 6,5 6,5 5))", "POINT(5.5 5.5)"),
  (4, "Axiom Township", 845696, "POLYGON((1 5,2 5,2 6,1 6,1 5))", "POINT(1.5 5.5)"),
  (5, "Elegant Island ", 987654, "POLYGON((3 3,4 3,4 4,3 4,3 3))", "POINT(3.5 3.5)");
Querying Geospatial Data
Figure 10. image
image


SingleStore provides a SQL function for each of these types of queries:

You can run queries over the neighborhood data to see how they relate to each other spatially. For example, you can join the table to itself and measure distances between neighborhoods:

SELECT b.name AS town,
  ROUND(GEOGRAPHY_DISTANCE(a.centroid, b.centroid), 0) AS distance_from_center,
  ROUND(GEOGRAPHY_DISTANCE(a.shape, b.shape), 0) AS distance_from_border
  FROM neighborhoods a, neighborhoods b
  WHERE a.id = 2
  ORDER BY 2;

  +--------------------+----------------------+----------------------+
  | town               | distance_from_center | distance_from_border |
  +--------------------+----------------------+----------------------+
  | Probability Bay    |                    0 |                    0 |
  | Elegant Island     |               314174 |               157090 |
  | Hamilton           |               444379 |               333195 |
  | Decentralized Park |               444531 |               333399 |
  | Axiom Township     |               628012 |               471030 |
  +--------------------+----------------------+----------------------+
  5 rows in set (0.00 sec)

Important

Distance is calculated using the standard metric for distance on a sphere.

You can also find out where you are with GEOGRAPHY_INTERSECTS:

SELECT id, name FROM neighborhoods WHERE
  GEOGRAPHY_INTERSECTS("POINT(2 1)", shape);

  +----+----------+
  | id | name     |
  +----+----------+
  |  1 | Hamilton |
  +----+----------+
  1 row in set (0.00 sec)
Spatial Joins

Geospatial objects are first-class datatypes in SingleStore, which means that you can use spatial relationships to join tables. For example, to find all of the businesses in your neighborhood, you join the businesses table with the neighborhoods table on whether the business location (a point) is contained by the neighborhood’s shape (a polygon).

Create the table:

CREATE TABLE businesses (
    id INT UNSIGNED NOT NULL PRIMARY KEY,
    name VARCHAR(64) NOT NULL,
    revenue INT UNSIGNED NOT NULL,
    location GEOGRAPHYPOINT NOT NULL,
    INDEX (location)
);

INSERT INTO businesses VALUES
   (1,"Touring Machines Bike Repair", 10000, "POINT(1.25 1.25)"),
   (2,"Manhattan Boat Club", 20000, "POINT(5.75 5.25)"),
   (3,"The Philospher's Diner", 12345, "POINT(1.25 5.25)"),
   (4,"Push & Pop Cafe", 23456, "POINT(3.25 3.75)"),
   (5,"Lil Mac's Subs", 76543, "POINT(5.75 5.75)"),
   (6,"Touring Machines Bike Repair", 21012, "POINT(5.25 1.75)"),
   (7,"Random Slice Pizza", 11111, "POINT(1.75 1.75)")
;

Find businesses contained within a specific neighborhood:

SELECT b.id, b.name FROM businesses b, neighborhoods n
  WHERE n.name="Elegant Island" AND
  GEOGRAPHY_CONTAINS(n.shape, b.location);

  +----+-----------------+
  | id | name            |
  +----+-----------------+
  |  4 | Push & Pop Cafe |
  +----+-----------------+
  1 row in set (0.00 sec)
SELECT b.id, b.name FROM businesses b, neighborhoods n
  WHERE n.name="Hamilton" AND
  GEOGRAPHY_CONTAINS(n.shape, b.location)
  ORDER BY 1;

  +----+------------------------------+
  | id | name                         |
  +----+------------------------------+
  |  1 | Touring Machines Bike Repair |
  |  7 | Random Slice Pizza           |
  +----+------------------------------+
  2 rows in set (0.03 sec)

These functions and many more are documented in Geospatial Functions.

Geospatial Model

SingleStore uses a spherical model similar to that used in Google Earth. It assumes a perfectly spherical Earth with a radius of 6,367,444.66 meters. The real Earth is not a sphere. It resembles an ellipsoid slightly wider than it is tall, with many local variations. In a spherical model, measurements of distance and area may vary from reality. For example, a 10,000 meter line drawn vertically over San Francisco would measure about 10,020 meters in this model. Topological functions, which describe the relationships between shapes and points, less affected by this difference.

Columnstore Geospatial

In SingleStore, geospatial data and functions are supported in columnstore (i.e. on-disk) tables. Columnstore tables do not support indexes on geo data. The native GeographyPoint type is supported but the Geography type is not. However, you can store path and polygon data in a plain text field, and run geospatial functions over them just as you do with rowstore (in-memory) tables.

CREATE TABLE neighborhoods_colstore(
  id INT UNSIGNED NOT NULL,
  name VARCHAR(64) NOT NULL,
  population INT UNSIGNED NOT NULL,
  shape TEXT NOT NULL,
  centroid GEOGRAPHYPOINT NOT NULL,
  key (name) USING clustered columnstore,shard key (id));

INSERT INTO neighborhoods_colstore VALUES
  (1, "Hamilton", 12346, "POLYGON((1 1,2 1,2 2, 1 2, 1 1))","POINT(1.5 1.5)"),
  (2, "Probability Bay", 263951, "POLYGON((5 1,6 1,6 2,5 2,5 1))", "POINT(5.5 1.5)"),
  (3, "Decentralized Park", 29265, "POLYGON((5 5,6 5,6 6,5 6,5 5))", "POINT(5.5 5.5)"),
  (4, "Axiom Township", 845696, "POLYGON((1 5,2 5,2 6,1 6,1 5))", "POINT(1.5 5.5)"),
  (5, "Elegant Island ", 987654, "POLYGON((3 3,4 3,4 4,3 4,3 3))", "POINT(3.5 3.5)");

SELECT id, name FROM neighborhoods_colstore WHERE
  GEOGRAPHY_INTERSECTS("POINT(2 1 )", shape);
Special Spatial Tuning Parameters

Spatial indices on Geography columns may be given an optional RESOLUTION parameter. You can choose a resolution between 6 and 32. This controls how many pieces the polygons and linestrings are broken up into for indexing. As a general rule, a lower number means less memory consumption, faster inserts and updates, but slower query times. A higher number may increase query performance at a cost of memory and insert performance.

Important

These tuning parameters should only be adjusted if you have a measurable need for more performance and have worked through the tradeoffs between ingest, computational load, and memory consumption.

For example, it may be useful to compare the percentage difference in the number of rows returned by GEOGRAPHY_INTERSECTS and APPROX_GEOGRAPHY_INTERSECTS.This difference is thefalse positive" rate of the index for a given resolution. For example, at resolution 16, suppose that 10% more records are returned by the approximate function. At resolution 8, the rate only grows to 11%. In that case you should be able to save memory and increase ingest throughput by using the lower number, without significantly harming query performance. Again, this kind of tuning is highly dependent on your particular dataset and workload. The parameters obtained for one application may not apply to another. Your mileage may vary.

You can also change the resolution of polygons passed into geospatial functions at query time. In this example, the resolution of the shape column is unchanged, but the POLYGON in the second argument is broken into 16 parts. Each part is queried separately. A lower number will result in fewer index lookup operations, but may return more false positive candidates from the index which are then filtered out of the result set in the second phase of query execution.

SELECT *
  FROM neighborhoods WITH (index = shape, resolution = 16)
  WHERE geography_intersects(shape,"POLYGON(...)");
Using GeoJSON Data

SingleStore does not have native GeoJSON support. However, we have generic JSON support as well as computed columns. Combining these features, you can import a subset of GeoJSON data using SingleStore’s built-in JSON type and GEOGRAPHYPOINT type.

For example, given a set of GeoJSON points like this:

{
    "type": "Feature",
    "geometry": {
      "type": "Point",
      "coordinates": [123.456, 78.901]
    },
    "properties": {
      "id": 123456,
      "name": "Probability Bay Naval Observatory"
    }
  }

You can construct a table like this:

CREATE TABLE geostuff (
  id AS dataz::properties::%id persisted BIGINT unsigned,
  name AS dataz::properties::$name persisted VARCHAR(128),
  location AS geography_point(
    dataz::geometry::coordinates::%`0`,
    dataz::geometry::coordinates::%`1`
  ) persisted geographypoint,
  dataz JSON NOT NULL,
  index (location)
  );

And then insert the JSON string:

INSERT INTO geostuff (dataz) VALUES ('{"type": "Feature", "geometry": {"type": "Point","coordinates": [123.456, 78.901]},"properties": {"id": 123456, "name": "Probability Bay Naval Observatory" }}');

SELECT id, name, location FROM geostuff;

  +--------+-----------------------------------+---------------------------------+
  | id     | name                              | location                        |
  +--------+-----------------------------------+---------------------------------+
  | 123456 | Probability Bay Naval Observatory | POINT(123.45600012 78.90100000) |
  +--------+-----------------------------------+---------------------------------+