Flow on Helios FAQ

Do I need a license key to use Flow on Helios?

No, a license key is not required. You are only billed for the time your Flow instance is running.

How does billing for Flow work, and how do I pay for it?

Flow on Helios uses a pay-per-usage model. You pay only for the time your Flow instance is running. Charges are deducted from your Helios credits, so no separate payment is needed.

Where can I view the logs?

On the Cloud Portal, go to Ingestion > Load Data, then select your Flow pipeline and click Details. When the dashboard opens, the Logs tab can be viewed on the right side of the toolbar.

Which IP addresses do I need to whitelist on the source database?

You must whitelist the outbound IP addresses of your SingleStore cluster in your source database’s network configuration. To find them, navigate to Clusters, open the Actions list (select the three dots) for your cluster, and then select Access & Security. Under Firewall, the IP addresses are listed in the Outbound tab.

Why am I getting the error: "Unable to connect to destination database"?

Please verify that your username and password are correct in the Destination Database configuration. To reset your SingleStore database password, navigate to Clusters, select Connect > Your App for your cluster, and then select Reset password.

If I reset the password for my SingleStore cluster, do I need to update it in all my Flow instances?

Yes, you are required to update the password in all your Flow pipelines with the new password.

I am unable to view the destination database while creating a Flow pipeline, even though my cluster is active and a database is already attached. Why is that?

This issue may be due to firewall restrictions. Verify your firewall settings and add your current IP address to the list of allowed inbound IPs to ensure database access.

Can I connect to my source database using Flow via private links?

Yes. To connect, create an outbound private link in the Cloud Portal. Refer to Configure Outbound Connections for more information. 

How can I copy all the source tables to a SingleStore database of my choice?

On the Flow dashboard, go to the Destination Database configuration tab and select Advanced Options. Enter the database name in the Database name and optional table name pattern and Database for staging tables fields.

Can I define a schema for tables before extraction?

Yes. There are two ways to do this:

Option 1: 

  1. Create all the tables using custom SQL queries.

  2. Go to the Destination Database configuration tab, select Advanced Options and select Truncate table instead of drop.

Option 2:

  1. Select the tables you want to move to SingleStore and enable Skip Initial Extract. 

  2. Go to Operations and do a Full Extract. This creates the tables in the SingleStore database.

  3. Once the tables are created, modify them to add shard or sort keys. 

  4. After modification, go to the Destination Database configuration tab, select Advanced Options and select Truncate table instead of drop.

How to copy just the schema of source before migration?

Select the tables you want to move to SingleStore and enable Skip Initial Extract. Go to Operations and do a Full Extract. This creates the tables in the SingleStore database.

When a connection fails, the error message says "Connection string is invalid. Unable to parse". How can I identify the issue?

This error typically indicates that you have spaces in any of the configuration fields or the hostname, port or database name is incorrect. Check these fields for formatting issues or incorrect values and try again.

My scheduled pipeline didn’t trigger - why?

This can happen if the scheduler is turned off or misconfigured, for example, set to run at 00h 00m 00s. In case of file-based replication on source (e.g., MySQL or Oracle Log Miner), Ingest may be waiting for the next log file to be created.

I have a database with 1TB of data. I want to migrate all the data to SingleStore and enable CDC for new transactions to my source table. How do I proceed?

For this use case, both Ingest and XL Ingest are needed. Follow these steps for the migration:

  1. Identify tables greater than 5GB. 

  2. Select the tables from the list and select Skip Initial Extract.

  3. Go to Operations and do a Full Extract. This creates the selected tables in SingleStore without any data. 

  4. Verify that the tables are created in SingleStore.

  5. Select XL Ingest from the dropdown list on the top right of the dashboard. 

  6. Migrate the tables using XL Ingest. Refer to SingleStore XL Ingest for more information. 

  7. Once the migration is done, go to Ingest and select all the tables.

  8. Go to Operations and select Sync New Tables. This migrates smaller tables first and then starts the CDC for all the tables. 

I want to update my scheduler. How do I do it?

To update the scheduler, go to the Schedule tab, update the configuration, and click Apply. Note that updating the scheduler is only applicable to time-based replication (MySQL Continuous Log Miner, Oracle Log Miner, SQL Server) not for file-based replication. 

Error messages appear at the top of the screen and disappear quickly, making them hard to notice. How can I see them more easily?

You can go to the Logs tab and view all the errors for your instance.

I configured firewall rules to allow IP access, but I still cannot connect Flow to my source database. What could be the issue?

Ensure that the correct outbound IP addresses from your SingleStore cluster have been added to your source database's network configuration. Also, make sure the network allows connections on the port your database is using.

I have successfully established a connection, but I'm encountering authentication errors. Why is the connection failing after setup?

Flow does not automatically update database passwords if they are changed after the connection is created. If the username or password for your source or destination database has been modified, you may receive authentication errors. To resolve this, update the credentials in Flow and re-test the connection.

Why does Flow create a new database when I have specified a destination database during pipeline creation?

By default, Flow creates a new database with the same name as the source database name. To avoid this, go to the Destination Database configuration tab, select Advanced Options and specify the database name in the field Schema for all tables.

How can I monitor the progress of data ingestion without accessing logs?

You can track the ingestion progress from the Flow dashboard. The extraction progress bar provides real-time information on how much data has been transferred and how much is still pending.

How can I load multiple source tables into one target table in Flow?

Currently, Flow does not support loading multiple source tables into a single target table.

How can I add a prefix to the database names migrated from source to SingleStore?

Go to the Destination Database configuration tab, select Advanced Options, and add a prefix of your choice in the Add Database Prefix field.

I see eff_dt and end_dt column errors in the logs, how can I resolve this?

These errors typically occur when Maintain History was enabled during the initial extract but later disabled. This leads to a schema mismatch between the source and the target (SingleStore) database. To resolve this, you can choose one of the following options:

  • Delete the eff_dt and end_dt columns from your SingleStore database.

  • Drop the table from your SingleStore database, select Redo Initial Extract for the table, then go to Operations and select Sync New Tables.

Is the ENUM data type supported in Flow's data type casting?

No, Flow does not currently support the ENUM data type in data type casting.

How to load data into reference tables?

Loading data into reference tables using Flow requires using the DDL endpoint for destination connection. Using the DML endpoint may result in an error like:

LOAD DATA into reference table on a child aggregator is not permitted on child aggregators. Try the command again on the master aggregator.Loading data into reference tables using Flow requires using the DML endpoint for destination connection. Using the DDL endpoint may result in an error like:

To resolve this:

  1. Identify the cluster group ID of your SingleStore cluster.

  2. In the Flow instance, update the hostname for the SingleStore destination connection so that the id is replaced with the cluster group ID and the dml tag is changed to ddl. Keep the rest of the hostname unchanged.

    Example:

  3. Test the connection again. It should now allow ingesting data into the reference table.

    svc-78636867-cf31-4b41-4765-24a3997fd429-ddl.aws-ireland-2.svc.singlestore.com

What happens when a new table is added to the source database, and how to start syncing it with Flow?

If a new table is created in the source database, Flow continues running without errors until that table is explicitly selected in the Flow dashboard. If the table is selected but no full extract has been performed and there are no changes on the source table yet, no errors are observed since there is nothing to extract or load. When changes occur on the source table before a full extract has been run, Flow reports errors such as "No columns found for table …" for that table.

NoteSync Struct detects schema changes only when the number of columns changes. If the number of columns remains the same (for example, a column rename without adding or dropping a column), Sync Struct does not detect the change automatically.

To start syncing the new table:

  1. Turn off the scheduler for the Flow instance.

  2. Select the new table in the Tables configuration.

    1. If the table is large (for example, larger than 5 GB), select Skip Initial Extract so that only the schema is created initially.

  3. Go to Operations and select Sync New Tables.

    1. If the table is marked as Skip Initial ExtractSync New Tables creates an empty table on the destination. Use XL Ingest to load the data while the scheduler is off, and turn the scheduler back on so CDC can resume.

    2. If the table is not marked as Skip Initial ExtractSync New Tables runs a full extract and then starts CDC for the new table.

What is the recommended procedure for adding a new column to an existing table?

To add the new column safely:

  1. Allow delta runs to complete so the table is fully synchronized.

  2. Turn off the scheduler for the Flow instance.

  3. Add the column on the source table.

  4. On the Operations page, click Sync Struct to propagate the schema change to the destination.

After Sync Struct completes successfully, CDC resumes with the updated schema.

How are column renames handled, and how to recover from related errors?

Column renames are not handled automatically by Sync Struct. If a column is renamed on the source without recreating the table in Flow, CDC fails with errors indicating a missing column, for example:

  • Error(SOP615: Error details(PGT356): Unable to find column ... - possible structure change

To recover from this error and resume CDC:

  1. Identify the table where one or more columns have been renamed.

  2. Ensure the scheduler is turned off.

  3. In the Tables configuration, select the table and enable Redo Initial Extract.

  4. Go to Operations and run Sync New Tables. This:

    • Drops the existing destination table.

    • Recreates it with the updated schema (including the renamed column).

    • Reloads all data for that table from the source.

To avoid a full reload, the following workaround can be used:

  • Turn off the scheduler.

  • Manually rename the column in the SingleStore database.

  • Roll back to the extract just before the error occurred.

  • Turn the scheduler back on.

Do schema changes in ETL reporting tables require Flow configuration changes?

Flow does not affect ETL tables (for example, *_v2 tables), as these are loaded via stored procedures. If a column needs to be added to an ETL table (such as outcome_v2config_v2, or item_results_v2), no changes are required in the Flow configuration.

Is it possible to add or modify a shard key column on an existing ETL table?

Shard keys cannot be changed on an existing table. 

How to perform a full resync of my entire environment (all clusters and Flow instances)?

A full environment resync reloads all data in the cluster, across all clusters and all Flow instances connected to that environment.

Full Extract drops and recreates the tables unless the Truncate table instead of drop option is selected. If Truncate table instead of drop is selected, the tables must be dropped manually.

To reload an entire environment:

  1. (Optional) Back up any required tables or databases.

  2. Drop the existing tables or the entire database for the environment that needs to be reloaded. 

Note: This step is only required if Truncate table instead of drop is enabled; otherwise, Full Extract drops tables automatically.

  1. Recommended: Drop the existing Flow instance and create a new one. If this is not possible, reuse the same instance but treat it as a fresh setup.

  2. Reconfigure the pipeline following the standard new customer setup documented in Use Flow on Helios (source configuration, destination configuration, tables, schedule, and initial full extract).

How to perform a full resync for a single customer? 

In Ingest, a single-customer resync reloads all tables for one specific customer (for example, one RDS instance or logical customer environment), without affecting other customers.

To reload the data for a single customer:

  1. (Optional) Back up the existing tables for that customer in SingleStore.

  2. If Truncate table instead of drop is enabled for the pipeline, drop the customer’s tables from SingleStore manually; otherwise Flow drops and recreate them during the full extract.

  3. In the Tables configuration, select all tables for that customer and enable Redo Initial Extract.

  4. Go to Operations and run Full Extract to reload the data for those tables from the source.

CDC resumes from the latest successful delta after the full extract completes.

Can I resync data for only a specific tenant or a subset of rows?

A tenant is a logical grouping of data - for example, a customer, test, or administrative environment. Tenants are mapped to clusters that share a cluster group (cluster ID). Flow resync operations are defined at the table or database level.

Currently, Flow does not support resyncing only a subset of rows within a table. Resync is supported only for full tables or entire databases.

Workaround

Although Flow cannot resync only a subset of rows, you can approximate this behavior as follows:

  • Delete the affected rows from the table in SingleStore.

  • Identify the primary key values for those rows, and use XL Ingest to re-extract them, either by slicing or by adding the primary key to the WHERE clause.

How to resync only specific tables for a customer? 

Selective table resync reloads one or more chosen tables without affecting the rest of the pipeline, unlike a single-customer resync, which reloads all tables for that customer.

To resync only selected tables:

  1. (Optional) Back up the existing tables to be reloaded.

  2. If Truncate table instead of drop is enabled for the pipeline, drop only the tables that need to be reloaded from SingleStore, otherwise Flow drops and recreates them during the full extract.

  3. In the Tables configuration, select those tables and enable Redo Initial Extract.

  4. (Optional) If only a subset of rows needs to be re-extracted during the full extract, add a WHERE clause filter for those tables. 

    Note: Enter only the filter condition without the WHERE keyword.

  5. Go to Operations and run Full Extract.

The WHERE clause applies only to the full extract; subsequent CDC runs continue to ingest all changes for those tables, regardless of the filter.

Last modified:

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK

Try Out This Notebook to See What’s Possible in SingleStore

Get access to other groundbreaking datasets and engage with our community for expert advice.