Appendix
On this page
Understanding the Extraction Process
Extraction Process
The extraction process consists of two parts:
-
Initial Extract
-
Delta Extract
Initial Extract
An initial extract is performed the first time Ingest connects to a database.
Delta Extract
After the initial extract, Ingest performs delta extracts.
A typical delta extract log file looks like this:
Extracting 2
Delta Extract database_name:table_name
Info (ME188): Stage pre-BCP
Info (ME190): Stage post-BCP
Info (ME260): Stage post-process
Delta Extract database_name complete (10 records)
Extracted 2
Load file 2
Creating table dbname_schemaname.table_name...
Created table dbname_schemaname.table_name
Loading table dbname_schemaname.table_name with x records(n bytes)
Created new connection org.mariadb.jdbc.Connection@4ca52dc7
Replace data...
Loading ./spool/dbname_schemaname.table_name_2.dat into dbname_schemaname.table_name
Loaded ./spool/dbname_schemaname.table_name_2.dat
Deleted ./spool/dbname_schemaname.table_name_2.dat
Replace data completed
Loaded table dbname_schemaname.table_name(0 of 1 left)
Loaded file 2(Source=2025-01-07 10:01:56 IST)
First Extract
The first extract always needs to be a Full Extract.
Additional Configurations
Source Database
While configuring the source database, there are additional configurations for each Extract Type.
-
Handle zero length strings: Load zero-length strings directly from the source to the destination.
-
Extract Threads: The number of extracting threads to use.
-
Log file catchup count: The number of Oracle archive logs processed in one instance.
-
Log catchup time (mins):
-
Log catchup offset (mins):
-
Log file look-ahead:
-
Convert RAW to Hex: Convert raw columns to hex strings instead of treating them as CHAR(1).
Destination Database
While configuring the destination database, there are additional configurations for each Extract Type.
-
Max Updates: Combine updates that exceed this value.
-
Load Threads: The number of loading threads to use.
-
Add Database Prefix:
-
Truncate table instead of drop:
-
Schema for all tables: Ignore the source schema and place all tables in this schema on the destination.
-
Ignore database name in schema: Check this option to ignore the database name as part of the schema prefix for destination tables.
-
Schema for staging tables: Specify the schema name to be used for staging tables in the destination.
-
Retain staging tables: Check this option to retain staging tables in the destination.
Flow Events for AWS CloudWatch Logs and SNS
Ingest supports connections to AWS CloudWatch Logs, CloudWatch Metrics, and SNS.
The following is a list of events that Ingest pushes to the AWS CloudWatch Logs console and AWS SNS:
Flow Events |
Description |
---|---|
|
Archive log file processed (Oracle only) |
|
Source table extraction complete for SQL Server and Oracle (initial extracts only) |
|
Source extraction batch is complete |
|
Destination table load complete |
|
All destination table loads in a batch complete |
|
Unrecoverable error occurred, disabled the Scheduler |
|
Error occurred, but process will retry |
The following are the details for each of the SingleStore Flow events:
Event: LogfileProcessed
Attribute |
Is Metric(Y/N)? |
Description |
---|---|---|
type |
N |
“LogfileProcessed” |
generated |
N |
Timestamp of message |
source |
N |
Instance name |
sourceType |
N |
“CDC” |
fileSeq |
N |
File sequence |
file |
N |
File name |
dictLoadMS |
Y |
Time taken to load dictionary in milliseconds |
CurrentDBDate |
N |
Current database date |
CurrentServerDate |
N |
Current Flow server date |
parseMS |
Y |
Time taken to parse file in milliseconds |
parseComplete |
N |
Timestamp when parsing is complete |
sourceDate |
N |
Source date |
Event: TableExtracted
Attribute |
Is Metric(Y/N)? |
Description |
---|---|---|
type |
N |
“TableLoaded” |
subType |
N |
Table name |
generated |
N |
Timestamp of message |
source |
N |
Instance name |
sourceType |
N |
“CDC” |
tabName |
N |
Table name |
success |
N |
true/false |
message |
N |
Status message |
sourceTS |
N |
Source date time |
sourceInserts |
Y |
Number of Inserts in source |
sourceUpdates |
Y |
Number of Updates in source |
sourceDeletes |
Y |
Number of Deletes in source |
Event: ExtractCompleted
Attribute |
Is Metric(Y/N)? |
Description |
---|---|---|
type |
N |
“ExtractCompleted” |
generated |
N |
Timestamp of message |
source |
N |
Instance name |
sourceType |
N |
“CDC” |
jobType |
N |
“EXTRACT” |
jobSubType |
N |
Extract type |
success |
N |
Y/N |
message |
N |
Status message |
runId |
N |
Run ID |
sourceDate |
N |
Source date |
dbDate |
N |
Current database date |
fromSeq |
N |
Start file sequence |
toSeq |
N |
End file sequence |
extractId |
N |
Run ID for extract |
tableErrors |
Y |
Count of table errors |
tableTotals |
Y |
Count of total tables |
Event: TableLoaded
Attribute |
Is Metric(Y/N)? |
Description |
---|---|---|
type |
N |
“TableLoaded” |
subType |
N |
Table name |
generated |
N |
Timestamp of message |
source |
N |
Instance name |
sourceType |
N |
“CDC” |
tabName |
N |
Table name |
success |
N |
true/false |
message |
N |
Status message |
sourceTS |
N |
Source date time |
sourceInserts |
Y |
Number of Inserts in source |
sourceUpdates |
Y |
Number of Updates in source |
sourceDeletes |
Y |
Number of Deletes in source |
destInserts |
Y |
Number of Inserts in destination |
destUpdates |
Y |
Number of Updates in destination |
destDeletes |
Y |
Number of Deletes in destination |
Event: LoadCompleted
Attribute |
Is Metric(Y/N)? |
Description |
---|---|---|
type |
N |
“LoadCompleted” |
generated |
N |
Timestamp of message |
source |
N |
Instance name |
sourceType |
N |
“CDC” |
jobType |
N |
“LOAD” |
jobSubType |
N |
Subtype of the “LOAD” |
success |
N |
Y/N |
message |
N |
Status message |
runId |
N |
Run ID |
sourceDate |
N |
Source date |
dbDate |
N |
Current database date |
fromSeq |
N |
Start file sequence |
toSeq |
N |
End file sequence |
extractId |
N |
Run ID for extract |
tableErrors |
Y |
Count of table errors |
tableTotals |
Y |
Count of total tables |
Event: HaltError
Attribute |
Is Metric (Y/N)? |
Description |
---|---|---|
type |
N |
“HaltError” |
generated |
N |
Timestamp of message |
source |
N |
Instance name |
sourceType |
N |
“CDC” |
message |
N |
Error message |
errorId |
N |
Short identifier |
Event: RetryError
Attribute |
Is Metric (Y/N) ? |
Description |
---|---|---|
type |
N |
“RetryError” |
generated |
N |
Timestamp of message |
source |
N |
Instance name |
sourceType |
N |
“CDC” |
message |
N |
Error message |
errorId |
N |
Short identifier |
Last modified: January 31, 2025