Load Dataframes From Notebooks

Create a Dataset with Geospatial and Embeddings

Create a dataset with complex data types, such as, integer, date time, geospatial, and vectors.

Run the following command to install Shapely:

Python

!pip3 install shapely

Create the dataset:

Python

from shapely import wkt
import pandas as pd
import numpy as np

data = {
    'Integer': [1, 2, 3, 4],
    'Datetime': [pd.Timestamp(x) for x in ['2023-06-01', '2023-06-02', '2023-06-03', '2023-06-04']],
    'Geospatial Data': [wkt.loads(x) for x in ['POLYGON((1 1,2 1,2 2, 1 2, 1 1))', 'POLYGON((5 1,6 1,6 2,5 2,5 1))', 'POLYGON((5 5,6 5,6 6,5 6,5 5))', 'POLYGON((1 5,2 5,2 6,1 6,1 5))']],
    'Vectors': [np.array(x) for x in [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]]
}


df = pd.DataFrame(data)
display(df)

Connecting to the Database

Select the Connection and the Database to use from the dropdown boxes at the top of the notebook.

Python

from sqlalchemy import *
db_connection = create_engine(connection_url)

Check Table Schema

Check the schema for the table; it is not optimal since geospatial and embeddings data were interpreted as text by the engine.

SQL

%%sql
SELECT
    TABLE_SCHEMA,
    TABLE_NAME,
    COLUMN_NAME,
    COLUMN_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'table_schema_inference' AND TABLE_SCHEMA = 'MyDatabase';

COLUMN_NAME	COLUMN_TYPE
Integer	bigint(20)
Datetime	datetime
Geospatial Data	text
Vectors	text

Create a Table with the Correct Schema

Geospatial indexes are only supported with a rowstore table. Ensure the schema is correct and rowstore in the creation process:

SQL

%%sql
CREATE ROWSTORE TABLE `table_schema_defined` (
  `Integer` bigint(20) DEFAULT NULL,
  `Datetime` datetime DEFAULT NULL,
  `Geospatial Data` GEOGRAPHY NOT NULL,
  `Vectors` blob,
  index (`Geospatial Data`)
)

With the changes made to the SingleStore Python client library, no extra steps are required to convert the data before inserting it into a database. All numpy arrays, shapely and pygeos geometry objects in a dataframe will be converted into the correct format automatically.

Load the Dataframe Into the Predefined Table Schema

This method is fast but will not provide the best schema inference.

SQL

df.to_sql('table_schema_defined', con = db_connection, if_exists = 
'append', index = False, chunksize = 1000)

Verify the Data

Check that the geospatial and vector data have been correctly loaded.

SQL

%%sql
select * from table_schema_defined

Load Dataframes From Notebooks

On this page

Create a Dataset with Geospatial and Embeddings

Connecting to the Database

Check Table Schema

Create a Table with the Correct Schema

Load the Dataframe Into the Predefined Table Schema

Verify the Data

Was this article helpful?

On this page

Was this article helpful?