Working with Geospatial Features

SingleStore supports geospatial queries. The term geospatial covers a very broad range of features. There is a formal standardization for some of those features, known as SQL-MM or OpenGIS. The features in SingleStore are not a complete implementation of that standard, but follow the same structure. SingleStore uses the S2Geometry library.

At a basic level, geospatial support requires three things: data types, topological functions, and measurement functions.

Geospatial Types

The three main geospatial object types are points, paths, and polygons. SingleStore supports a subset of the standard Well-Known Text (WKT) syntax to describe them.

All geospatial functions can accept either a geospatial object or the equivalent WKT string as arguments. SingleStore does not support MULTIPOLYGON, GEOMETRY, CURVE, or other keywords.

Point

A POINT is a simple longitude / latitude pair. Note the lack of a comma between the numbers. Also note that the ordering in the pair is significant, the first value is the longitude and the second value is the latitude.

"POINT(-74.044514 40.689244)"

Path or Linestring

A linestring or path object is an ordered list of points. The points (vertices) in a linestring (or polygon) object are connected by edges.

"LINESTRING(0 0, 0 1, 1 1)"

Polygon

A polygon is similar to a linestring, except that it is closed. That is, it must have at least three unique points, and the first and last point-pairs must be equal:

"POLYGON((0 0, 0 1, 1 1, 0 0))"

When defining a polygon (or loop), the area inside the polygon's edges is the "smaller" portion of the globe. For example, if you define a polygon that outlines the shape of Australia, there could be two possible sides: All of the globe except Australia, or Australia itself. Because the side with the smaller area contains Australia, that is what is considered inside the polygon. Because of this, it is not possible to select more than half of the globe inside a polygon.

You can also describe multiple shapes (or rings) inside a polygon object, i.e., a polygon object can also be represented as a set of rings. For example, a set of islands, or a lake inside a country.

Each ring is represented as a set of points. The first ring (external ring) in the set represents the shape of the polygon. The subsequent rings in the set (internal rings) represent the holes in the shape. The rings within the boundary of a polygon (internal rings) can never cross. They may however intersect at a point, but only as a tangent. Each respective shape embedded within a polygon object is represented using the following notation.

"POLYGON(Ring 0, Ring 1, ... Ring N)"

Note the parenthesis around pairs of numbers, which are separated by commas.

"POLYGON((0 0, 0 10, 10 10, 10 0, 0 0), (5 5, 6 5, 5 6, 5 5))"

In the above example, rings (0 0, 0 10, 10 10, 10 0, 0 0) and (5 5, 6 5, 5 6, 5 5) represent Ring 0 and Ring 1, respectively.

Edges

Edges in SingleStore are "spherical geodesics", i.e. they are the shortest paths on the sphere and therefor they do not always follow raw longitude/latitude lines like "plate carrée" projection edges do.

This can lead to unexpected results when looking for intersection, etc. as the shapes formed from these edges in SingleStore may differ from those drawn in other applications and viewer/visualization tools that use "plate carrée" projection edges. For that reason, when testing it is best to use a viewer/application that uses spherical geodesics. Note that using a viewer or other application is not necessary -- it's used here just to show the difference in behavior between geodesics and plate projection edges.

For example, in the two images below we have a polygon outlining the country of Sweden and a second polygon checking for intersection. The same polygon definitions are used in both. We are using WKT viewers to demonstrate the difference in behavior based on the edge algorithm being used. Here is the polygon definition for the intersection shape:

POLYGON ((-0.1 66.41326, 90.1 66.41326, 90.1 85.15113, -0.1 85.15113, -0.1 66.41326))

In the first image, the two polygons intersect. That is because the viewer is using "plate carrée" projection edges.

Two polygons that intersect. The first is the outline of the country of Sweden and the second is a square that crosses the northern half of Sweden.

In this next view, the polygons do not intersect. The viewer is using geodesic edges:

Two polygons that do not intersect. One is the outline of the country of Sweden and the second is a trapezoid shape that passes north of the first polygon.

Because SingleStore uses geodesic edges, there would not be an intersection between the two polygons.

Creating a Geospatial Table

Geospatial objects are just another data type in SingleStore and can be used right alongside numbers, strings, and JSON. To create a column that holds point data, use the GeographyPoint type. For path and polygon data, use the Geography type.

Important

It is possible to store POINTs in the Geography type, but is not recommended. The GeographyPoint type was designed specifically for point data, and is highly optimized for memory efficiency and speed.

In this example, we’ll create a table of neighborhoods. The centroid field is the point that is roughly in the middle of the borders of the neighborhood. The shape field is the polygon describing those borders. All geospatial data can be indexed for very high query performance.

CREATE ROWSTORE TABLE neighborhoods (
id INT UNSIGNED NOT NULL PRIMARY KEY,
name VARCHAR(64) NOT NULL,
population INT UNSIGNED NOT NULL,
shape GEOGRAPHY NOT NULL,
centroid GEOGRAPHYPOINT NOT NULL,
index (shape) WITH (resolution = 8),
index (centroid)
);
INSERT INTO neighborhoods VALUES
(1, "Hamilton", 12346, "POLYGON((1 1,2 1,2 2, 1 2, 1 1))","POINT(1.5 1.5)"),
(2, "Probability Bay", 263951, "POLYGON((5 1,6 1,6 2,5 2,5 1))", "POINT(5.5 1.5)"),
(3, "Decentralized Park", 29265, "POLYGON((5 5,6 5,6 6,5 6,5 5))", "POINT(5.5 5.5)"),
(4, "Axiom Township", 845696, "POLYGON((1 5,2 5,2 6,1 6,1 5))", "POINT(1.5 5.5)"),
(5, "Elegant Island ", 987654, "POLYGON((3 3,4 3,4 4,3 4,3 3))", "POINT(3.5 3.5)");

Querying Geospatial Data

SingleStore provides a SQL function for each of these types of queries:

You can run queries over the neighborhood data to see how they relate to each other spatially. For example, you can join the table to itself and measure distances between neighborhoods:

SELECT b.name AS town,
ROUND(GEOGRAPHY_DISTANCE(a.centroid, b.centroid), 0) AS distance_from_center,
ROUND(GEOGRAPHY_DISTANCE(a.shape, b.shape), 0) AS distance_from_border
FROM neighborhoods a, neighborhoods b
WHERE a.id = 2
ORDER BY 2;
+--------------------+----------------------+----------------------+
| town               | distance_from_center | distance_from_border |
+--------------------+----------------------+----------------------+
| Probability Bay    |                    0 |                    0 |
| Elegant Island     |               314174 |               157090 |
| Hamilton           |               444379 |               333195 |
| Decentralized Park |               444531 |               333399 |
| Axiom Township     |               628012 |               471030 |
+--------------------+----------------------+----------------------+
5 rows in set (0.00 sec)

Important

Distance is calculated using the standard metric for distance on a sphere.

You can also find out where you are with GEOGRAPHY_INTERSECTS:

SELECT id, name FROM neighborhoods WHERE
GEOGRAPHY_INTERSECTS("POINT(2 1)", shape);
+----+----------+
| id | name     |
+----+----------+
|  1 | Hamilton |
+----+----------+
1 row in set (0.00 sec)

Spatial Joins

Geospatial objects are first-class data types in SingleStore, which means that you can use spatial relationships to join tables. For example, to find all of the businesses in your neighborhood, you join the businesses table with the neighborhoods table on whether the business location (a point) is contained by the neighborhood's shape (a polygon).

Create the table:

CREATE ROWSTORE TABLE geo_businesses (
id INT UNSIGNED NOT NULL PRIMARY KEY,
name VARCHAR(64) NOT NULL,
revenue INT UNSIGNED NOT NULL,
location GEOGRAPHYPOINT NOT NULL,
INDEX (location)
);
INSERT INTO geo_businesses VALUES
(1,"Touring Machines Bike Repair", 10000, "POINT(1.25 1.25)"),
(2,"Manhattan Boat Club", 20000, "POINT(5.75 5.25)"),
(3,"The Philospher's Diner", 12345, "POINT(1.25 5.25)"),
(4,"Push & Pop Cafe", 23456, "POINT(3.25 3.75)"),
(5,"Lil Mac's Subs", 76543, "POINT(5.75 5.75)"),
(6,"Touring Machines Bike Repair", 21012, "POINT(5.25 1.75)"),
(7,"Random Slice Pizza", 11111, "POINT(1.75 1.75)")
;

Find businesses contained within a specific neighborhood:

SELECT b.id, b.name FROM geo_businesses b, neighborhoods n
WHERE n.name="Elegant Island" AND
GEOGRAPHY_CONTAINS(n.shape, b.location);
+----+-----------------+
| id | name            |
+----+-----------------+
|  4 | Push & Pop Cafe |
+----+-----------------+
SELECT b.id, b.name FROM geo_businesses b, neighborhoods n
WHERE n.name="Hamilton" AND
GEOGRAPHY_CONTAINS(n.shape, b.location)
ORDER BY 1;
+----+------------------------------+
| id | name                         |
+----+------------------------------+
|  1 | Touring Machines Bike Repair |
|  7 | Random Slice Pizza           |
+----+------------------------------+

These functions and many more are documented in Geospatial Functions.

Geospatial Model

SingleStore uses a spherical model similar to that used in Google Earth. It assumes a perfectly spherical Earth with a radius of 6,367,444.66 meters. The real Earth is not a sphere. It resembles an ellipsoid slightly wider than it is tall, with many local variations. In a spherical model, measurements of distance and area may vary from reality. For example, a 10,000 meter line drawn vertically over San Francisco would measure about 10,020 meters in this model. Topological functions, which describe the relationships between shapes and points, less affected by this difference.

Columnstore Geospatial

In SingleStore, geospatial data and functions are supported in columnstore (i.e. on-disk) tables. Columnstore tables do not support indexes on geo data. The native GeographyPoint type is supported but the Geography type is not. However, you can store path and polygon data in a plain text field, and run geospatial functions over them just as you do with rowstore (in-memory) tables.

CREATE TABLE neighborhoods_colstore(
id INT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL,
population INT UNSIGNED NOT NULL,
shape TEXT NOT NULL,
centroid GEOGRAPHYPOINT NOT NULL,
sort key (name), shard key (id));
INSERT INTO neighborhoods_colstore VALUES
(1, "Hamilton", 12346, "POLYGON((1 1,2 1,2 2, 1 2, 1 1))","POINT(1.5 1.5)"),
(2, "Probability Bay", 263951, "POLYGON((5 1,6 1,6 2,5 2,5 1))", "POINT(5.5 1.5)"),
(3, "Decentralized Park", 29265, "POLYGON((5 5,6 5,6 6,5 6,5 5))", "POINT(5.5 5.5)"),
(4, "Axiom Township", 845696, "POLYGON((1 5,2 5,2 6,1 6,1 5))", "POINT(1.5 5.5)"),
(5, "Elegant Island ", 987654, "POLYGON((3 3,4 3,4 4,3 4,3 3))", "POINT(3.5 3.5)");
SELECT id, name FROM neighborhoods_colstore WHERE
GEOGRAPHY_INTERSECTS("POINT(2 1 )", shape);
+----+----------+
| id | name     |
+----+----------+
|  1 | Hamilton |
+----+----------+

Special Spatial Tuning Parameters

Spatial indices on Geography columns may be given an optional RESOLUTION parameter. You can choose a resolution between 6 and 32. This controls how many pieces the polygons and linestrings are broken up into for indexing. As a general rule, a lower number means less memory consumption, faster inserts and updates, but slower query times. A higher number may increase query performance at a cost of memory and insert performance.

Important

These tuning parameters should only be adjusted if you have a measurable need for more performance and have worked through the tradeoffs between ingest, computational load, and memory consumption.

For example, it may be useful to compare the percentage difference in the number of rows returned by GEOGRAPHY_INTERSECTS and APPROX_GEOGRAPHY_INTERSECTS.This difference is thefalse positive" rate of the index for a given resolution. For example, at resolution 16, suppose that 10% more records are returned by the approximate function. At resolution 8, the rate only grows to 11%. In that case you should be able to save memory and increase ingest throughput by using the lower number, without significantly harming query performance. Again, this kind of tuning is highly dependent on your particular dataset and workload. The parameters obtained for one application may not apply to another. Your mileage may vary.

You can also change the resolution of polygons passed into geospatial functions at query time. In this example, the resolution of the shape column is unchanged, but the POLYGON in the second argument is broken into 16 parts. Each part is queried separately. A lower number will result in fewer index lookup operations, but may return more false positive candidates from the index which are then filtered out of the result set in the second phase of query execution.

SELECT *
FROM neighborhoods WITH (index = shape, resolution = 16)
WHERE geography_intersects(shape,"POLYGON(...)");

Using GeoJSON Data

SingleStore does not have native GeoJSON support. However, we have generic JSON support as well as computed columns. Combining these features, you can import a subset of GeoJSON data using SingleStore's built-in JSON type and GEOGRAPHYPOINT type.

For example, given a set of GeoJSON points like this:

{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [123.456, 78.901]
},
"properties": {
"id": 123456,
"name": "Probability Bay Naval Observatory"
}
}

You can construct a table like this:

CREATE ROWSTORE TABLE geostuff (
id AS dataz::properties::%id persisted BIGINT unsigned,
name AS dataz::properties::$name persisted VARCHAR(128),
location AS geography_point(
dataz::geometry::coordinates::%`0`,
dataz::geometry::coordinates::%`1`
) persisted geographypoint,
dataz JSON NOT NULL,
index (location)
);

And then insert the JSON string:

INSERT INTO geostuff (dataz) VALUES
('{"type": "Feature", "geometry":
{"type": "Point","coordinates": [123.456, 78.901]},"properties": {"id": 123456, "name": "Probability Bay Naval Observatory" }}');

And query:

SELECT id, name, location FROM geostuff;
  +--------+-----------------------------------+---------------------------------+
  | id     | name                              | location                        |
  +--------+-----------------------------------+---------------------------------+
  | 123456 | Probability Bay Naval Observatory | POINT(123.45600012 78.90100000) |
  +--------+-----------------------------------+---------------------------------+

Last modified: July 17, 2024

Was this article helpful?