Configure Core Files

Overview

A core file, also known as a core dump, is a recorded state of the working memory of a computer program at a specific time, and is typically created when a program crashes or otherwise terminates abnormally. In the case of SingleStore, a core file typically contains complete information about what caused a node to crash. Some node crashes cannot be debugged without an associated core file.

Should a node crash, a file named core.<pid> is generated either in the data directory of the impacted node or in the /var/lib/systemd/coredump/ directory on the host. If a core file is not generated when a node crashes, it is likely because core file creation is disabled.

The amount of disk space used by a core file is roughly equal to the value of Total_server_memory (SHOW STATUS EXTENDED LIKE 'Total_server_memory';) at the time the core file is created. As nodes can consume many gigabytes of memory, SingleStore recommends that each host in the cluster has adequate disk space to accommodate core files from each node.

Refer to Core dump for more information.

Core Files and systemd

Some Linux systems use systemd (systemd-coredump) to manage core files, which can be determined by looking for kernel.core_pattern in sysctl on a host.

sudo sysctl -A | grep core_pattern
kernel.core_pattern = |/lib/systemd/systemd-coredump %P %u %g %s %t %h

If systemd-coredump is managing core files:

  1. Determine where node core files are being saved.

    coredumpctl list
    TIME                          PID UID GID SIG     COREFILE     EXE                                                 SIZE
    Wed 2023-12-06 18:48:37 UTC 71347 114 121 SIGABRT inaccessible /opt/singlestoredb-server-8.1.30-e0a67e68e5/memsqld  n/a

    Use the memsqld PID from the coredumpctl list output with the following command.

    coredumpctl info 71347
               PID: 71347 (memsqld)
               UID: 114 (memsql)
               GID: 121 (memsql)
            Signal: 6 (ABRT)
         Timestamp: Wed 2023-12-06 18:48:29 UTC (13min ago)
      Command Line: /opt/singlestoredb-server-8.1.30-e0a67e68e5/memsqld --defaults-file /var/lib/memsql/b9dee0dc-aa76-4444-a874-c6579b547920/memsql.cnf --user 114
        Executable: /opt/singlestoredb-server-8.1.30-e0a67e68e5/memsqld
     Control Group: /user.slice/user-1000.slice/session-2251.scope
              Unit: session-2251.scope
             Slice: user-1000.slice
           Session: 2251
         Owner UID: 1000 (ubuntu)
           Boot ID: 1126edb4aa674b0ba092468a03efee65
        Machine ID: 970e8353a4364205a8b6964379f8418b
          Hostname: ip-172-31-31-120
           Storage: /var/lib/systemd/coredump/core.memsqld.114.1126edb4aa674b0ba092468a03efee65.71347.1701888509000000.zst (inaccessible)
           Message: Process 71347 (memsqld) of user 114 dumped core.
                    Found module /opt/singlestoredb-server-8.1.30-e0a67e68e5/memsqld without build-id.
                    Stack trace of thread 71347:
                    #0  0x00007fb6ae318dbf n/a (n/a + 0x0)

    As reflected in the Storage line in the above output, core files are being saved to /var/lib/systemd/coredump.

  2. Check the following variables and values in the /etc/systemd/coredump.conf file and increase them to 8GB or greater.

    [Coredump]
    ProcessSizeMax=8G
    ExternalSizeMax=8G
    JournalSizeMax=8G

By default, systemd-coredump keeps core files for only 3 days. To display the core file retention period, use grep to look for core in systemd.conf.

cat /usr/lib/tmpfiles.d/systemd.conf | grep core
d /var/lib/systemd/coredump 0755 root root 3d

In this case, core files are kept for 3d, or 3 days.

Enable Core Files

While a SingleStore core file can be enabled for a single node, SingleStore recommends enabling core files for all nodes in the cluster. Doing so can assist with any subsequent debugging efforts.

  1. Check the current core file status.

    SELECT @@core_file;
    +-------------+
    | @@core_file |
    +-------------+
    |           0 |
    +-------------+
    1 row in set (0.00 sec)

    The default value for @@core_file is 1, which indicates that creating core files is enabled. If this value is 0, creating core files is disabled.

  2. To enable core files on all nodes in the cluster, update the memsql.cnf file of each node with the following command.

    sdb-admin update-config --all --key "core_file" --value "1" -y
  3. As creating core files is only enabled once a node is restarted, restart all of the nodes in the cluster.

    • For high-availability (HA) clusters, perform a rolling restart, where the workload continues to run during the restart.

      sdb-admin restart-node --all --online
    • For non-HA clusters, perform an offline restart, where the workload stops running during the restart.

      sdb-admin restart-node --all
  4. Confirm that creating core files is enabled.

    SELECT @@core_file;
    +-------------+
    | @@core_file |
    +-------------+
    |           1 |
    +-------------+
    1 row in set (0.00 sec)

Set the Core File Mode

SingleStore core files can either be partial or full. By default, a partial core file is created.

  1. Check the current core file mode.

    SELECT @@core_file_mode;
    +------------------+
    | @@core_file_mode |
    +------------------+
    | PARTIAL          |
    +------------------+
    1 row in set (0.00 sec)
  2. To change the core file mode, update the memsql.cnf file of each node in the cluster.

    • For full core files

      sdb-admin update-config --all --key "core_file_mode" --value "FULL" -y
    • For partial core files

      sdb-admin update-config --all --key "core_file_mode" --value "PARTIAL" -y
  3. As the core file mode is only updated once a node is restarted, restart all of the nodes in the cluster.

    • For high-availability (HA) clusters, perform a rolling restart, where the workload continues to run during the restart.

      sdb-admin restart-node --all --online
    • For non-HA clusters, perform an offline restart, where the workload stops running during the restart.

      sdb-admin restart-node --all
  4. Confirm that the desired core file mode has been set.

    select @@core_file_mode;
    +------------------+
    | @@core_file_mode |
    +------------------+
    | FULL             |
    +------------------+
    1 row in set (0.00 sec)

Set the Core File Size Limit

While a proper value in the core_file_mode SingleStore engine variable ensures that core files are created, the Max core file size value in Linux must also be set to unlimited. Otherwise, core files may either be truncated or zero bytes in length.

Use the following commands to verify and set the Max core file size limits for the memsqld process.

  1. Find the memsqld server PID.

    pgrep memsqld
    6164 <- memsqld server PID
    6166 <- memsqld command process PID
  2. Check the Max core file size limits. The output reflects the soft limit, the hard limit, and the size units.

    cat /proc/6164/limits | grep -i core
    Max core file size    	0               unlimited               bytes
  3. If these limits are not set to  unlimited, set them to unlimited.

    1. First, change the limits to 0.

      sudo prlimit --core=0 --pid=6164
      cat /proc/6164/limits | grep -i core
      Max core file size    	 0              0                bytes
    2. Next, change the limits to unlimited.

      sudo prlimit --core=unlimited --pid=616
      cat /proc/6164/limits | grep -i core
      Max core file size      unlimited      unlimited        bytes

      Note that prlimit will not allow the hard limit of the user to be exceeded, which is typically defined in the /etc/security/limits.conf file.

      While prlimit can alter a running process’s Max core file size limits, the hard limit is reset to the system defaults when the host is rebooted. To set the hard limit permanently, add the following line to the /etc/security/limits.conf file, substituting the username that runs memsqld for <username> (typically, memsql) and reboot the host.

      <username>           -       core            unlimited

Refer to How to Use the ulimit Linux Command {With Examples} for more information.

Set the Core File Name and Location

While a core file is typically created in the same working directory as the program that crashed, core file names and locations can be defined in the /proc/sys/kernel/core_pattern file.

cat /proc/sys/kernel/core_pattern
/usr/lib/systemd/systemd-coredump %e %d %p %u %g %h %s %t 

Where:

  • %e: filename of the process (program) that crashed

  • %d: core dump mode

  • %p: PID of the process

  • %u: UID under which the process was running

  • %g: GID under which the process was running

  • %h: hostname on which the process was running

  • %s: signal that caused the core dump

  • %t: time the core dump occurred

Refer to the core(5) man page for more information and additional % specifiers.

Temporarily

The following steps demonstrate how to temporarily set a custom core file name and location.

  1. Review the current core_pattern file.

    cat /proc/sys/kernel/core_pattern
    core
  2. Update the core_pattern file.

    sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p.%h.%t
    kernel.core_pattern = /var/crash/core.%e.%p.%h.%t
  3. Confirm that the core_pattern file has been updated.

    cat /proc/sys/kernel/core_pattern
    /var/crash/core.%e.%p.%h.%t

Permanently

The core file name and/or location can be set permanently by editing the /etc/sysctl.conf file. This will persist the contents of the core_pattern file after a host is rebooted.

  1. Create an example core_files directory in /tmp.

    mkdir -p /tmp/core_files
  2. Change the permissions of this directory so that files can be saved to it.

    sudo chmod a+rwx /tmp/core_files
  3. Edit the kernel.core_pattern value.

    sudo vi /etc/sysctl.conf

    Add the following line.

    kernel.core_pattern = /tmp/core_files/core.%e.%p.%h.%t
  4. Update the DAEMON_COREFILE_LIMIT value.

    sudo vi /etc/sysconfig/init

    Update or add the following line.

    DAEMON_COREFILE_LIMIT='unlimited'
  5. Run the following command to apply these changes.

    sudo sysctl -p
    kernel.core_pattern = /tmp/core_files/core.%e.%p.%h.%t

Test the Core File Configuration

The following steps demonstrate how to test your core file configuration by manually causing a core file.

  1. Find the memsqld process for the cluster’s Master Aggregator.

    ps aux | grep memsqld
    memsql    2151  0.0  0.0 1493432 4040 ?        Ssl  14:23   0:00 /opt/singlestoredb-server-8.7.14-4c3ad9de46/memsqld_safe --defaults-file /var/lib/memsql/5241afea-479c-404c-8db3-c6a3d58f3b8c/memsql.cnf --user 985 --auto-restart StagedEnable
    memsql    2152  0.0  0.0 1427896 4016 ?        Ssl  14:23   0:00 /opt/singlestoredb-server-8.7.14-4c3ad9de46/memsqld_safe --defaults-file /var/lib/memsql/5393b5d2-09db-468d-a32c-919a00396585/memsql.cnf --user 985 --auto-restart StagedEnable
    memsql    2170 23.5  2.6 3114064 212496 ?      Sl   14:23   1:46 /opt/singlestoredb-server-8.7.14-4c3ad9de46/memsqld --defaults-file /var/lib/memsql/5241afea-479c-404c-8db3-c6a3d58f3b8c/memsql.cnf --user 985
    memsql    2174 22.8  2.5 3108116 208076 ?      Sl   14:23   1:43 /opt/singlestoredb-server-8.7.14-4c3ad9de46/memsqld --defaults-file /var/lib/memsql/5393b5d2-09db-468d-a32c-919a00396585/memsql.cnf --user 985
    memsql    2228  0.0  1.0 456640 85872 ?        Ssl  14:23   0:00 /opt/singlestoredb-server-8.7.14-4c3ad9de46/memsqld --defaults-file /var/lib/memsql/5241afea-479c-404c-8db3-c6a3d58f3b8c/memsql.cnf --user 985
    memsql    2229  0.0  1.0 456640 85880 ?        Ssl  14:23   0:00 /opt/singlestoredb-server-8.7.14-4c3ad9de46/memsqld --defaults-file /var/lib/memsql/5393b5d2-09db-468d-a32c-919a00396585/memsql.cnf --user 985
  2. To create a core file, kill the main memsqld process.

    sudo kill -ABRT 2151
  3. Change to the directory where core files are saved. In this example, the directory is /tmp/core_files.

    cd /tmp/core_files/
  4. Confirm that the core file name matches what is defined in the core_pattern file.

    The output will resemble the below.

    ls
    core.memsqld.2151.master-agg-and-leaf-ip-10-0-0-191.1534356958

Last modified: September 19, 2024

Was this article helpful?