# Load Dataframes From Notebooks

The simplest and fastest way to load data is using a pandas dataframe. However, the schema inference is not always optimal. Define the table and schema in advance to avoid having to make future schema alterations.

## Create a Dataset with Geospatial and Embeddings

Create a dataset with complex data types, such as, integer, date time, geospatial, and vectors.

Run the following command to install Shapely:

```python
!pip3 install shapely
```

Create the dataset:

```python
from shapely import wkt
import pandas as pd
import numpy as np

data = {
    'Integer': [1, 2, 3, 4],
    'Datetime': [pd.Timestamp(x) for x in ['2023-06-01', '2023-06-02', '2023-06-03', '2023-06-04']],
    'Geospatial Data': [wkt.loads(x) for x in ['POLYGON((1 1,2 1,2 2, 1 2, 1 1))', 'POLYGON((5 1,6 1,6 2,5 2,5 1))', 'POLYGON((5 5,6 5,6 6,5 6,5 5))', 'POLYGON((1 5,2 5,2 6,1 6,1 5))']],
    'Vectors': [np.array(x) for x in [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]]
}


df = pd.DataFrame(data)
display(df)
```

## Connecting to the Database

Select the Connection and the Database to use from the dropdown boxes at the top of the notebook.

```python
from sqlalchemy import *
db_connection = create_engine(connection_url)
```

## Check Table Schema

Check the schema for the table; it is not optimal since geospatial and embeddings data were interpreted as text by the engine.

```sql
%%sql
SELECT
    TABLE_SCHEMA,
    TABLE_NAME,
    COLUMN_NAME,
    COLUMN_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'table_schema_inference' AND TABLE_SCHEMA = 'MyDatabase';
```

| COLUMN\_NAME    | COLUMN\_TYPE |
| --------------- | ------------ |
| Integer         | bigint(20)   |
| Datetime        | datetime     |
| Geospatial Data | text         |
| Vectors         | text         |

## Create a Table with the Correct Schema

Geospatial indexes are only supported with a rowstore table. Ensure the schema is correct and rowstore in the creation process:

```sql
%%sql
CREATE ROWSTORE TABLE `table_schema_defined` (
  `Integer` bigint(20) DEFAULT NULL,
  `Datetime` datetime DEFAULT NULL,
  `Geospatial Data` GEOGRAPHY NOT NULL,
  `Vectors` blob,
  index (`Geospatial Data`)
) 
```

With the changes made to the SingleStore Python client library, no extra steps are required to convert the data before inserting it into a database. All numpy arrays, shapely and pygeos geometry objects in a dataframe will be converted into the correct format automatically.

## Load the Dataframe Into the Predefined Table Schema

This method is fast but will not provide the best schema inference.

```sql
df.to_sql('table_schema_defined', con = db_connection, if_exists = 
'append', index = False, chunksize = 1000)
```

## Verify the Data

Check that the geospatial and vector data have been correctly loaded.

```sql
%%sql
select * from table_schema_defined
```

***

Modified at: November 17, 2025

Source: [/cloud/container-services/notebooks/load-dataframes-from-notebooks/](https://docs.singlestore.com/cloud/container-services/notebooks/load-dataframes-from-notebooks/)

(An index of the documentation is available at /llms.txt)
