Appendix

Understanding the Extraction Process

Extraction Process

The extraction process consists of two parts:

  • Initial Extract

  • Delta Extract

Initial Extract

An initial extract is performed the first time Ingest connects to a database. During this extract, the entire table is replicated from the source database to the destination.

Delta Extract

After the initial extract, Ingest performs delta extracts. Delta extracts capture only the changes made since the last extraction and merge them with the destination.

A typical delta extract log file looks like this:

Extracting 2
Delta Extract database_name:table_name
Info (ME188): Stage pre-BCP
Info (ME190): Stage post-BCP
Info (ME260): Stage post-process
Delta Extract database_name complete (10 records)
Extracted 2
Load file 2
Creating table dbname_schemaname.table_name...
Created table dbname_schemaname.table_name
Loading table dbname_schemaname.table_name with x records(n bytes)
Created new connection org.mariadb.jdbc.Connection@4ca52dc7
Replace data...
Loading ./spool/dbname_schemaname.table_name_2.dat into dbname_schemaname.table_name
Loaded ./spool/dbname_schemaname.table_name_2.dat
Deleted ./spool/dbname_schemaname.table_name_2.dat
Replace data completed
Loaded table dbname_schemaname.table_name(0 of 1 left)
Loaded file 2(Source=2025-01-07 10:01:56 IST)

First Extract

The first extract always needs to be a Full Extract. This extracts the entire table, and future extractions are delta extracts that run periodically based on your desired frequency.

Additional Configurations

Source Database

While configuring the source database, there are additional configurations for each Extract Type.

  • Handle zero length strings: Load zero-length strings directly from the source to the destination.

  • Extract Threads: The number of extracting threads to use.

  • Log file catchup count: The number of Oracle archive logs processed in one instance.

  • Log catchup time (mins):

  • Log catchup offset (mins):

  • Log file look-ahead:

  • Convert RAW to Hex: Convert raw columns to hex strings instead of treating them as CHAR(1).

Destination Database

While configuring the destination database, there are additional configurations for each Extract Type.

  • Max Updates: Combine updates that exceed this value.

  • Load Threads: The number of loading threads to use.

  • Add Database Prefix:

  • Truncate table instead of drop:

  • Schema for all tables: Ignore the source schema and place all tables in this schema on the destination.

  • Ignore database name in schema: Check this option to ignore the database name as part of the schema prefix for destination tables.

  • Schema for staging tables: Specify the schema name to be used for staging tables in the destination.

  • Retain staging tables: Check this option to retain staging tables in the destination.

Flow Events for AWS CloudWatch Logs and SNS

Ingest supports connections to AWS CloudWatch Logs, CloudWatch Metrics, and SNS. These integrations enable monitoring of Ingest operations and facilitate interaction with other assets utilizing the AWS infrastructure. AWS CloudWatch Logs can capture event logs, such as load completion or failure, from Ingest. These logs can also help monitor error conditions and trigger alarms.

The following is a list of events that Ingest pushes to the AWS CloudWatch Logs console and AWS SNS:

Flow Events

Description

LogfileProcessed

Archive log file processed (Oracle only)

TableExtracted

Source table extraction complete for SQL Server and Oracle (initial extracts only)

ExtractCompleted

Source extraction batch is complete

TableLoaded

Destination table load complete

LoadCompleted

All destination table loads in a batch complete

HaltError

Unrecoverable error occurred, disabled the Scheduler

RetryError

Error occurred, but process will retry

The following are the details for each of the SingleStore Flow events:

Event: LogfileProcessed

Attribute

Is Metric(Y/N)?

Description

type

N

“LogfileProcessed”

generated

N

Timestamp of message

source

N

Instance name

sourceType

N

“CDC”

fileSeq

N

File sequence

file

N

File name

dictLoadMS

Y

Time taken to load dictionary in milliseconds

CurrentDBDate

N

Current database date

CurrentServerDate

N

Current Flow server date

parseMS

Y

Time taken to parse file in milliseconds

parseComplete

N

Timestamp when parsing is complete

sourceDate

N

Source date

Event: TableExtracted

Attribute

Is Metric(Y/N)?

Description

type

N

“TableLoaded”

subType

N

Table name

generated

N

Timestamp of message

source

N

Instance name

sourceType

N

“CDC”

tabName

N

Table name

success

N

true/false

message

N

Status message

sourceTS

N

Source date time

sourceInserts

Y

Number of Inserts in source

sourceUpdates

Y

Number of Updates in source

sourceDeletes

Y

Number of Deletes in source

Event: ExtractCompleted

Attribute

Is Metric(Y/N)?

Description

type

N

“ExtractCompleted”

generated

N

Timestamp of message

source

N

Instance name

sourceType

N

“CDC”

jobType

N

“EXTRACT”

jobSubType

N

Extract type

success

N

Y/N

message

N

Status message

runId

N

Run ID

sourceDate

N

Source date

dbDate

N

Current database date

fromSeq

N

Start file sequence

toSeq

N

End file sequence

extractId

N

Run ID for extract

tableErrors

Y

Count of table errors

tableTotals

Y

Count of total tables

Event: TableLoaded

Attribute

Is Metric(Y/N)?

Description

type

N

“TableLoaded”

subType

N

Table name

generated

N

Timestamp of message

source

N

Instance name

sourceType

N

“CDC”

tabName

N

Table name

success

N

true/false

message

N

Status message

sourceTS

N

Source date time

sourceInserts

Y

Number of Inserts in source

sourceUpdates

Y

Number of Updates in source

sourceDeletes

Y

Number of Deletes in source

destInserts

Y

Number of Inserts in destination

destUpdates

Y

Number of Updates in destination

destDeletes

Y

Number of Deletes in destination

Event: LoadCompleted

Attribute

Is Metric(Y/N)?

Description

type

N

“LoadCompleted”

generated

N

Timestamp of message

source

N

Instance name

sourceType

N

“CDC”

jobType

N

“LOAD”

jobSubType

N

Subtype of the “LOAD”

success

N

Y/N

message

N

Status message

runId

N

Run ID

sourceDate

N

Source date

dbDate

N

Current database date

fromSeq

N

Start file sequence

toSeq

N

End file sequence

extractId

N

Run ID for extract

tableErrors

Y

Count of table errors

tableTotals

Y

Count of total tables

Event: HaltError

Attribute

Is Metric (Y/N)?

Description

type

N

“HaltError”

generated

N

Timestamp of message

source

N

Instance name

sourceType

N

“CDC”

message

N

Error message

errorId

N

Short identifier

Event: RetryError

Attribute

Is Metric (Y/N) ?

Description

type

N

“RetryError”

generated

N

Timestamp of message

source

N

Instance name

sourceType

N

“CDC”

message

N

Error message

errorId

N

Short identifier

Last modified: January 31, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK