Audit Log File Format
On this page
Warning
As of the time of this publication, audit logging features are made available and licensed only as part of the SingleStore Premium edition.
Audit logs are written to the directory specified by the auditlogsdir
variable.
Each node in a cluster will create its own files in the destination directory using the following naming convention:
auditlog_<hostname>-<port>_<year>-<month>-<day>_<hour>-<minute>-<second>.log
For example, consider the following log file name:
auditlog_myVmbox-3306_2016-08-30_06-38-46.log
If a log file’s rotation size is very small, an auto-incrementing integer may be appended to the end of the log file.
auditlog_myVmbox-3306_2016-08-30_06-38-46_1.log
auditlog_myVmbox-3306_2016-08-30_06-38-46_2.log
auditlog_myVmbox-3306_2016-08-30_06-38-46_3.log
Data within the log file is written in comma-separated values (CSV) format.
The auditlog_
engine variable controls whether the audit is written to disk synchronously.
Log Entry File Schemas
There are four types of schemas for log file entries, as explained below.
Schema for First Line in a Log File
The first line in a log file has a unique schema that differs from the generic case.ADMIN-ONLY
:
0,2020-08-11 19:04:54.580,INFO: Log opened on MemSQL startup. Format version 2. Log level changed to ADMIN-ONLY
The first line above demonstrates the following schema:
<log-entry-id>,<timestamp>,INFO: <log-file-event-origin>. <log-format-version>. <log-file-event-action>
This line’s schema is described in the table below:
Example Value |
Definition |
---|---|
|
This value is always |
|
This value is identical to a normal timestamp. |
|
Describes the reason or event that caused the creation of the log file. |
|
Indicates the reason or event that opened or rotated the log file. |
Schema for ALL-RESULTS Database Responses
If your node is configured to use the ALL-RESULTS
logging mode, the database’s responses are logged uniquely from other entries.SHOW VARIABLES LIKE '%master_
query:
237,R,master aggregator
The log entry above demonstrates the following schema:
<parent-query-log-entry-id>,R,<data>
Query results share the same entry ID as the query itself.<entry-id>
value of 237
.R
denotes that the log entry only contains query result data.
General Log Entry Schema
All entries in a log file adhere to a predefined schema.UPDATE t SET a = 0 WHERE a = 1
:
237,2016-08-25 11:29:09,PDT,localhost:3306,agg,1,99999,root,db,temp_1_14_16,9959767090931131257,UPDATE t SET a = 0 WHERE a = 1
The log entry above demonstrates the following schema:
<log-entry-id>,<timestamp>,<time-zone>,<hostname>:<port>,<node-type>,<aggregator-id>,<thread-id>,<username>,<database-name>,<correlation-id>,<query-hash>,<query>
Therefore, each of the 12 values can be mapped in the following way:
Example Value |
Definition |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
log-entry-id
The first value is log-entry-id
, which represents the entry’s unique identifier within the scope of a single log file.0
, incrementing by one afterwards.log-entry-id
is a 64-bit unsigned integer value.
The entry ID for any given operation is unique to the node, regardless of how a query is processed across the cluster.
If your database is configured to use the ALL-RESULTS
logging mode, the results will appear as additional log entries with the same <log-entry-id>
as the parent.
timestamp
The second value is timestamp
, which specifies the node’s local system time when an operation was executed.%Y-%m-%d %H:%M:%S.
format.
time-zone
The third value is time-zone
, which specifies the local time zone when an operation was executed.GMT
(Greenwich Mean Time).
hostname:port
The fourth value is hostname:port
, which specifies the node’s hostname and port.hostname
global variable.
node-type
The fifth value is node-type
, which specifies whether the node is an aggregator (agg
) or a leaf (leaf
).
aggregator-id
The sixth value is aggregator-id
, which specifies the ID of the aggregator that issued any given query.aggregator-id
is a 64-bit unsigned integer value.
An aggregator-id
may contain an invalid value of 18446744073709551615
if the aggregator has not yet been given an ID.
In a cluster with multiple aggregators, the ID of the master aggregator never changes, but the ID of a child aggregator may change.
When correlating a master aggregator’s query with its resulting downstream leaf node queries, you can use the combination of an entry’s correlation-id
and aggregator-id
values as a unique identifier.correlation-id
and aggregator-id
as a key.
thread-id
The seventh value is thread-id
, which specifies the ID of the node’s thread that is executing the query.thread-id
is a 32-bit integer value.
username
The eight value is username
, which specifies the username of the authenticated user who is executing the query.
database-name
The ninth value is database-name
, which specifies the name of the database that is being queried.
This value may appear as [unknown]
in a log entry under a few conditions:
-
If the query is not related to a specific database (e.
g. SHOW VARIABLES
). -
If the user does not specify
USE <database-name>
before executing their query.However, in this case the query does contain the database, such as SELECT * from <db-name>.
.t
correlation-id
The tenth value is correlation-id
, which specifies an identifier for the current query.
When correlating a master aggregator’s query with its resulting downstream leaf node queries, you can use the combination of an entry’s correlation-id
and aggregator-id
values as a unique identifier.correlation-id
and aggregator-id
as a key.
Under some circumstances, a correlation-id
value may not appear in the log entry.
query-hash
The 11th value is query-hash
, which specifies a unique hash of either the query signature or the query text.query-hash
is a 64-bit unsigned integer value.
The full query text will only be hashed if the node’s logging level is set to ALL-QUERIES-PLAINTEXT
or ALL-QUERIES-PLAINTEXT-INCLUDING-PARSE-FAILS
.
query
The 12th value is the query
, which specifies either the query signature or the query text.
The full query text will only be written to the log if the node’s logging level is set to ALL-QUERIES-PLAINTEXT
or ALL-QUERIES-PLAINTEXT-INCLUDING-PARSE-FAILS
.
The USING
clause of queries is not captured as part of audit logging and can be viewed via SHOW PROCESSLIST where applicable.
Log Entry Schema for LOGINS-ONLY Level
If your node is configured to use the LOGINS-ONLY
logging mode, the log entry uses the following schema:
<log-entry-id>,<timestamp>,<time-zone>,<hostname>:<port>,<node-type>,USER_LOGIN,<thread-id>,<username>,<remote-host>,<user-grant>,<auth-type>,<success or failure>:<reason>
Note: Login data and the database activity can coexist in one file.
The following log entry uses the schema specified above, and it represents a user pac
with a successful login attempt:
3,2020-08-11 18:53:18.150,UTC,c770dd909a9c:3306,agg,USER_LOGIN,100000,pac,192.168.0.1,pac@%,mysql_native_password,SUCCESS
The following table describes the fields in the schema, and identifies each of the values in the example above:
Entry |
Description |
Example Value |
---|---|---|
|
Represents the log entry’s unique identifier. |
3 |
|
The node’s local system time when an operation was executed. |
2020-08-11 18:53:18. |
|
Specifies the local time zone when an operation was executed. |
UTC |
|
Specifies the node’s hostname and port. |
c770dd909a9c:3306 |
|
Specifies whether the node is an |
agg |
|
A constant which indicates that the record format represents a user login. |
USER_ |
|
Id of the thread executing the statement. |
100000 |
|
Specifies the username that is trying to log in. |
pac |
|
Hostname/address from which the client is connecting. |
192. |
|
The |
pac@% |
|
Indicates the authentication method which is active (for example native password, pam, Kerberos, etc. |
mysql_ |
|
Indicates the success or failure of the login attempt. |
SUCCESS |
For more information on common log entries, see the General Log Entry Schema section above.
Last modified: November 28, 2022