Load Dataframes From Notebooks
On this page
The simplest and fastest way to load data is using a pandas dataframe.
Create a Dataset with Geospatial and Embeddings
Create a dataset with complex data types, such as, integer, date time, geospatial, and vectors.
Run the following command to install Shapely:
!pip3 install shapely
Create the dataset:
from shapely import wktimport pandas as pdimport numpy as npdata = {'Integer': [1, 2, 3, 4],'Datetime': [pd.Timestamp(x) for x in ['2023-06-01', '2023-06-02', '2023-06-03', '2023-06-04']],'Geospatial Data': [wkt.loads(x) for x in ['POLYGON((1 1,2 1,2 2, 1 2, 1 1))', 'POLYGON((5 1,6 1,6 2,5 2,5 1))', 'POLYGON((5 5,6 5,6 6,5 6,5 5))', 'POLYGON((1 5,2 5,2 6,1 6,1 5))']],'Vectors': [np.array(x) for x in [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]]}df = pd.DataFrame(data)display(df)
Connecting to the Database
Select the Connection and the Database to use from the dropdown boxes at the top of the notebook.
from sqlalchemy import *db_connection = create_engine(connection_url)
Check Table Schema
Check the schema for the table; it is not optimal since geospatial and embeddings data were interpreted as text by the engine.
%%sqlSELECTTABLE_SCHEMA,TABLE_NAME,COLUMN_NAME,COLUMN_TYPEFROM INFORMATION_SCHEMA.COLUMNSWHERE TABLE_NAME = 'table_schema_inference' AND TABLE_SCHEMA = 'MyDatabase';
COLUMN_ |
COLUMN_ |
---|---|
Integer |
bigint(20) |
Datetime |
datetime |
Geospatial Data |
text |
Vectors |
text |
Create a Table with the Correct Schema
Geospatial indexes are only supported with a rowstore table.
%%sqlCREATE ROWSTORE TABLE `table_schema_defined` (`Integer` bigint(20) DEFAULT NULL,`Datetime` datetime DEFAULT NULL,`Geospatial Data` GEOGRAPHY NOT NULL,`Vectors` blob,index (`Geospatial Data`))
With the changes made to the SingleStore Python client library, no extra steps are required to convert the data before inserting it into a database.
Load the Dataframe Into the Predefined Table Schema
This method is fast but will not provide the best schema inference.
df.to_sql('table_schema_defined', con = db_connection, if_exists ='append', index = False, chunksize = 1000)
Verify the Data
Check that the geospatial and vector data have been correctly loaded.
%%sqlselect * from table_schema_defined
Last modified: August 14, 2024