Flow on Helios FAQ
Do I need a license key to use Flow on Helios?
No, a license key is not required.
How does billing for Flow work, and how do I pay for it?
Flow on Helios uses a pay-per-usage model.
Where can I view the logs?
On the Cloud Portal, go to Ingestion > Load Data, then select your Flow pipeline and click Details.
Which IP addresses do I need to whitelist on the source database?
You must whitelist the outbound IP addresses of your SingleStore cluster in your source database’s network configuration.
Why am I getting the error: "Unable to connect to destination database"?
Please verify that your username and password are correct in the Destination Database configuration.
If I reset the password for my SingleStore cluster, do I need to update it in all my Flow instances?
Yes, you are required to update the password in all your Flow pipelines with the new password.
I am unable to view the destination database while creating a Flow pipeline, even though my cluster is active and a database is already attached.
This issue may be due to firewall restrictions.
Can I connect to my source database using Flow via private links?
Yes.
How can I copy all the source tables to a SingleStore database of my choice?
On the Flow dashboard, go to the Destination Database configuration tab and select Advanced Options.
Can I define a schema for tables before extraction?
Yes.
Option 1:
-
Create all the tables using custom SQL queries.
-
Go to the Destination Database configuration tab, select Advanced Options and select Truncate table instead of drop.
Option 2:
-
Select the tables you want to move to SingleStore and enable Skip Initial Extract.
-
Go to Operations and do a Full Extract.
This creates the tables in the SingleStore database. -
Once the tables are created, modify them to add shard or sort keys.
-
After modification, go to the Destination Database configuration tab, select Advanced Options and select Truncate table instead of drop.
How to copy just the schema of source before migration?
Select the tables you want to move to SingleStore and enable Skip Initial Extract.
When a connection fails, the error message says "Connection string is invalid.
This error typically indicates that you have spaces in any of the configuration fields or the hostname, port or database name is incorrect.
My scheduled pipeline didn’t trigger - why?
This can happen if the scheduler is turned off or misconfigured, for example, set to run at 00h 00m 00s.
I have a database with 1TB of data.
For this use case, both Ingest and XL Ingest are needed.
-
Identify tables greater than 5GB.
-
Select the tables from the list and select Skip Initial Extract.
-
Go to Operations and do a Full Extract.
This creates the selected tables in SingleStore without any data. -
Verify that the tables are created in SingleStore.
-
Select XL Ingest from the dropdown list on the top right of the dashboard.
-
Migrate the tables using XL Ingest.
Refer to SingleStore XL Ingest for more information. -
Once the migration is done, go to Ingest and select all the tables.
-
Go to Operations and select Sync New Tables.
This migrates smaller tables first and then starts the CDC for all the tables.
I want to update my scheduler.
To update the scheduler, go to the Schedule tab, update the configuration, and click Apply.
Error messages appear at the top of the screen and disappear quickly, making them hard to notice.
You can go to the Logs tab and view all the errors for your instance.
I configured firewall rules to allow IP access, but I still cannot connect Flow to my source database.
Ensure that the correct outbound IP addresses from your SingleStore cluster have been added to your source database's network configuration.
I have successfully established a connection, but I'm encountering authentication errors.
Flow does not automatically update database passwords if they are changed after the connection is created.
Why does Flow create a new database when I have specified a destination database during pipeline creation?
By default, Flow creates a new database with the same name as the source database name.
How can I monitor the progress of data ingestion without accessing logs?
You can track the ingestion progress from the Flow dashboard.
How can I load multiple source tables into one target table in Flow?
Currently, Flow does not support loading multiple source tables into a single target table.
How can I add a prefix to the database names migrated from source to SingleStore?
Go to the Destination Database configuration tab, select Advanced Options, and add a prefix of your choice in the Add Database Prefix field.
I see eff_ and end_ column errors in the logs, how can I resolve this?
These errors typically occur when Maintain History was enabled during the initial extract but later disabled.
-
Delete the
eff_anddt end_columns from your SingleStore database.dt -
Drop the table from your SingleStore database, select Redo Initial Extract for the table, then go to Operations and select Sync New Tables.
Is the ENUM data type supported in Flow's data type casting?
No, Flow does not currently support the ENUM data type in data type casting.
How to load data into reference tables?
Loading data into reference tables using Flow requires using the DDL endpoint for destination connection.
LOAD DATA into reference table on a child aggregator is not permitted on child aggregators. Try the command again on the master aggregator.Loading data into reference tables using Flow requires using the DML endpoint for destination connection. Using the DDL endpoint may result in an error like:To resolve this:
-
Identify the cluster group ID of your SingleStore cluster.
-
In the Flow instance, update the hostname for the SingleStore destination connection so that the
idis replaced with the cluster group ID and thedmltag is changed toddl.Keep the rest of the hostname unchanged. Example:
-
Test the connection again.
It should now allow ingesting data into the reference table. svc-78636867-cf31-4b41-4765-24a3997fd429-ddl.aws-ireland-2.svc.singlestore.com
What happens when a new table is added to the source database, and how to start syncing it with Flow?
If a new table is created in the source database, Flow continues running without errors until that table is explicitly selected in the Flow dashboard.No columns found for table …" for that table.
Note: Sync Struct detects schema changes only when the number of columns changes.
To start syncing the new table:
-
Turn off the scheduler for the Flow instance.
-
Select the new table in the Tables configuration.
-
If the table is large (for example, larger than 5 GB), select Skip Initial Extract so that only the schema is created initially.
-
-
Go to Operations and select Sync New Tables.
-
If the table is marked as Skip Initial Extract, Sync New Tables creates an empty table on the destination.
Use XL Ingest to load the data while the scheduler is off, and turn the scheduler back on so CDC can resume. -
If the table is not marked as Skip Initial Extract, Sync New Tables runs a full extract and then starts CDC for the new table.
-
What is the recommended procedure for adding a new column to an existing table?
To add the new column safely:
-
Allow delta runs to complete so the table is fully synchronized.
-
Turn off the scheduler for the Flow instance.
-
Add the column on the source table.
-
On the Operations page, click Sync Struct to propagate the schema change to the destination.
After Sync Struct completes successfully, CDC resumes with the updated schema.
How are column renames handled, and how to recover from related errors?
Column renames are not handled automatically by Sync Struct.
-
Error(SOP615: Error details(PGT356): Unable to find column .. . - possible structure change
To recover from this error and resume CDC:
-
Identify the table where one or more columns have been renamed.
-
Ensure the scheduler is turned off.
-
In the Tables configuration, select the table and enable Redo Initial Extract.
-
Go to Operations and run Sync New Tables.
This: -
Drops the existing destination table.
-
Recreates it with the updated schema (including the renamed column).
-
Reloads all data for that table from the source.
-
To avoid a full reload, the following workaround can be used:
-
Turn off the scheduler.
-
Manually rename the column in the SingleStore database.
-
Roll back to the extract just before the error occurred.
-
Turn the scheduler back on.
Do schema changes in ETL reporting tables require Flow configuration changes?
Flow does not affect ETL tables (for example, *_ tables), as these are loaded via stored procedures.outcome_, config_ or item_), no changes are required in the Flow configuration.
Is it possible to add or modify a shard key column on an existing ETL table?
Shard keys cannot be changed on an existing table.
How to perform a full resync of my entire environment (all clusters and Flow instances)?
A full environment resync reloads all data in the cluster, across all clusters and all Flow instances connected to that environment.
A Full Extract drops and recreates the tables unless the Truncate table instead of drop option is selected.
To reload an entire environment:
-
(Optional) Back up any required tables or databases.
-
Drop the existing tables or the entire database for the environment that needs to be reloaded.
Note: This step is only required if Truncate table instead of drop is enabled; otherwise, Full Extract drops tables automatically.
-
Recommended: Drop the existing Flow instance and create a new one.
If this is not possible, reuse the same instance but treat it as a fresh setup. -
Reconfigure the pipeline following the standard new customer setup documented in Use Flow on Helios (source configuration, destination configuration, tables, schedule, and initial full extract).
How to perform a full resync for a single customer?
In Ingest, a single-customer resync reloads all tables for one specific customer (for example, one RDS instance or logical customer environment), without affecting other customers.
To reload the data for a single customer:
-
(Optional) Back up the existing tables for that customer in SingleStore.
-
If Truncate table instead of drop is enabled for the pipeline, drop the customer’s tables from SingleStore manually; otherwise Flow drops and recreate them during the full extract.
-
In the Tables configuration, select all tables for that customer and enable Redo Initial Extract.
-
Go to Operations and run Full Extract to reload the data for those tables from the source.
CDC resumes from the latest successful delta after the full extract completes.
Can I resync data for only a specific tenant or a subset of rows?
A tenant is a logical grouping of data - for example, a customer, test, or administrative environment.
Currently, Flow does not support resyncing only a subset of rows within a table.
Workaround
Although Flow cannot resync only a subset of rows, you can approximate this behavior as follows:
-
Delete the affected rows from the table in SingleStore.
-
Identify the primary key values for those rows, and use XL Ingest to re-extract them, either by slicing or by adding the primary key to the
WHEREclause.
How to resync only specific tables for a customer?
Selective table resync reloads one or more chosen tables without affecting the rest of the pipeline, unlike a single-customer resync, which reloads all tables for that customer.
To resync only selected tables:
-
(Optional) Back up the existing tables to be reloaded.
-
If Truncate table instead of drop is enabled for the pipeline, drop only the tables that need to be reloaded from SingleStore, otherwise Flow drops and recreates them during the full extract.
-
In the Tables configuration, select those tables and enable Redo Initial Extract.
-
(Optional) If only a subset of rows needs to be re-extracted during the full extract, add a
WHEREclause filter for those tables.Note: Enter only the filter condition without the
WHEREkeyword. -
Go to Operations and run Full Extract.
The WHERE clause applies only to the full extract; subsequent CDC runs continue to ingest all changes for those tables, regardless of the filter.
Last modified: