Scheduling Notebooks with SingleStore Job Service

Note

This is a Preview feature.

With SingleStore Job Service, you can run your notebooks on a time-based schedule. This helps operationalize your notebook code to power your production scenarios like:

  • Data preparation and ML flows on your data stored in SingleStore

  • Building and sharing dashboards with Python charting libraries

  • Performing transforms and caching results in your SingleStore database

  • Ingesting data from various sources into SingleStore

Developers can leverage the Notebook interface to run these scenarios, utilize the full security of notebooks, and not have to coordinate separate work streams and data flows for these tasks.

Create a New Scheduled Job

You may create a new Scheduled Job from within a shared notebook or through the "Jobs" section in the left navigation. The Scheduled Job will run on the latest version of the notebook just prior to its scheduled execution time. For example: a Job Run is scheduled to start at 11:05 UTM – at this time, a copy of the notebook is saved and executed as a part of the run.

You may make edits to a notebook that is being referenced in a Job. To do so, ensure that the notebook is saved and in its final state prior to running in order to prevent unexpected errors. Alternatively, you can duplicate a notebook and make changes to the duplicate in order to not have those changes reflected in the Scheduled Job runs.

To create a new Scheduled Job:

  1. Navigate to a shared notebook.

    Note

    You can only schedule shared notebooks.

  2. Click Schedule.

  3. Select the SingleStore deployment your notebook will connect to.

    By selecting your SingleStore deployment, you can leverage native connection to the SingleStore data source referenced in the notebook. You can also run a Scheduled Job without a deployment attached to it. This can be useful when connecting to multiple SingleStore deployments in a single notebook.

    Note

    See Configurability and Key Considerations for how the deployments in your Organization can impact the maximum execution time of each Job run.

  4. Choose between scheduling this job to run either One Time or on a Recurring basis.

  5. Choose the Starting Date and Time you want for this job.

    The latest saved version of the notebook will be executed before the start time of a job run. See Configurability and Key Considerations.

  6. Set interval time between scheduled runs.

    The minimum interval time is 1 hour and the maximum interval is 7 days.

    Note

    The time you are configuring (interval) is the time after the completion of a run when the next run is scheduled. For example: if you have a job to run every hour and your first job run is scheduled at 13:00 and takes 5 mins to complete, the next job will begin at 14:05.

  7. Specify whether you want to save snapshots.

    A snapshot is a saved version of the notebook after the completion of a Job Run. Opt in for snapshots to know exactly what version of the notebook was used for each run.

    By default all snapshots for all Error and Failed Runs are saved.

    Note

    Snapshots and History are only saved for the last 25 Job runs. See Configurability and Key Considerations.

Manage an Existing Job

To see existing Scheduled Jobs and Job Runs, navigate to "Jobs" in the portal.

To delete an existing Job, navigate to "Jobs" in the left navigation and select Delete Job.

Note

Deleting a Job will delete all existing job runs associated with it, including snapshots.

You can edit your job settings for any active job. Navigate to Actions > Edit. For each job you can change:

  • The start time

  • The notebook being run on a schedule

  • The workspace and database deployment it is attached to

  • The frequency the job runs

Scheduled jobs cannot be changed from recurring to One Time. To do that, delete and recreate the scheduled job.

You cannot suspend an existing job.

Job Run Statuses

Here are the various possible job statuses and what they mean.

Status

Description

Scheduled

This is the next Job run for this job. It will take the latest version of the saved notebook based on the Start Time associated with the job.

Completed

The notebook in this job run ran to completion. If you enabled "save snapshots" you will be able to download the completed notebook.

Failed

One of the cells in the notebook failed to run to completion. This prevented this notebook from running to completion.

Error

There was an error unrelated to the notebook code that prevented this job run from completing. See Errors.

Diagnose Error Job Runs

For Job Runs with the error status, we will automatically save a snapshot of the notebook. Navigate to "Jobs" in the left navigation, find your Job, and download the snapshot associated with the job run to diagnose the error.

Error

Solution

Workspace/Cluster Deleted

Create a new Job Schedule with another deployment target.

Workspace/Cluster Suspended

Resume the Workspace/Cluster or create a new Job Schedule with another deployment target.

Database Detached

Reattach the database with the right permissions or create a new Job Schedule with another database target.

Internal Errors / Misc

Reach out to Support or use the chat feature in the Portal.

Notebook Timed Out

Navigate to the snapshot and see the cells which ran to completion and where the notebook timed out.

See Configurability and Key Considerations for the Execution Time limits set on a Job Run.

Notebook Deleted/Not present

Create a new Job Schedule with another notebook.

Configurability and Key Considerations

When scheduling notebooks via the job service, keep the following in mind:

  • A Job run will execute the latest version of a saved notebook before the start time of the run. If you want to make edits to a notebook and not affect the Scheduled Job run, create a duplicate of the notebook.

  • A maximum execution time is set for each Job Run. This is determined based on the deployment types present in the Organization:

    Deployment Type

    Limit

    Active Standard Workspace / Cluster

    Execution time is determined based on the execution Interval (in minutes) following:

    Execution Time = Execution Interval (mins) / 12

    The minimum execution time is 30 minutes. The maximum execution time is 120 minutes.

    Active Starter Workspace based on SingleStore Shared Tier

    15 minutes

    No Active Deployment

    5 minutes

  • History and Snapshots are saved only for the most recent 25 job runs per job.

Last modified: April 1, 2024

Was this article helpful?