Sep 23 2022 · Piotr Płoński

The 5 ways to schedule Jupyter Notebook

Schedule Jupyter Notebook banner Jupyter Notebook is an excellent tool for creating computational documents. The mix of code and markdown makes it perfect for creating data-rich reports, dashboards, and slides. There is often a requirement to update the notebook's results at a selected time interval and publish it to the rest of the team. For example, you must create a daily report from Jupyter Notebook and send it as a PDF file to team members by email.

In this post, I will summarize five different approaches for Jupyter Notebook scheduling. I want to cover the following features for each method:

  • where the notebook is executed, locally or in the cloud,
  • is User Interface for schedule management available?
  • option to export notebook to HTML or PDF files,
  • automatically send notebook as an attachment by email,
  • share a link to the executed notebook,
  • restrict access to the notebook for selected (authenticated) users,
  • history of previous executions available in User Interface,
  • parametrized executions with an option to override,
  • hide code in the executed notebook,
  • possibility to execute the notebook as slides.

1. Cron

Of course! The cron is a job scheduler on Unix-like operating systems. It can execute scripts and applications at fixed dates or periodic intervals. A crontab file controls it, with each file line containing the command to be executed. The command has a crontab string that defines the schedule on which the command will be executed. The example syntax is below, and you can read more about cron here.

# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday;
# │ │ │ │ │                                   7 is also Sunday on some systems)
# │ │ │ │ │
# │ │ │ │ │
# * * * * * <command to execute>

The cron can be used to execute any script. It is far from a complete solution for Jupyter Notebook scheduling. You need to implement the whole system by yourself! In consolation, some packages are available to make your life a little easier. You can use amazing nbconvert or papermill for executing notebooks.

GitHub repository nbconvert

The nbconvert can execute the notebook and save it in many formats (HTML/PDF/LaTeX/Markdown). It offers the option to select which cells are executed and can hide the code when exporting to HTML or PDF.

GitHub repository papermill

The papermill is more advanced in terms of execution options. You can parametrize notebooks and send final notebooks to external storage, like AWS S3 or Azure data blobs.

The example crontab file that executes the notebook every day at 8:00 AM might look like this:

#
# example crontab 
# executes the notebook every day at 8:00
#

0 8 * * * jupyter nbconvert --to html --execute /path/to/notebook.ipynb

The cron can be used locally or remote server. There is no User Interface available; everything is controlled by crontab file. Users that would like to share resulting notebooks by link (or email) with restricted access need to implement it by themself. It is a bare-bones solution. The cron should be used if you have very custom requirements and have time to implement a scheduling solution from scratch. It is a general-purpose scheduling tool without additional features for Jupyter Notebooks.

2. Windows Task Scheduler

The Windows Task Scheduler is the Windows operating system equivalent of Unix's cron. However, it has Graphical User Interface. You can use nbconvert or papermill on Windows systems.

Windows Task Scheduler

Similarly, as for cron, the Windows Task Scheduler is a general-purpose applications scheduler. It hasn't any features supporting Jupyter Notebook. All additional components need to be implemented manually. It might be a good solution for executing notebooks locally and consuming the output only by one person (local computer owner), as sharing will require custom implementation of the web server.

3. Notebooker

The Notebooker is open-source software for scheduling Jupyter Notebooks provided by Man Group (a financial company). It has AGPL-v3 license.

GitHub repository Notebooker

The Notebooker is a Flask based web application (using MongoDB as the backend's database) for scheduling and sharing Jupyter Notebooks. It provides a Graphical User Interface for managing Jupyter Notebooks.

Notebooker list notebooks

Credits: Image from https://github.com/man-group/notebooker

Each notebook has view with executed tasks. User Interface is available to rerun or delete a notebook:

Notebooker notebook tasks

Credits: Image from https://github.com/man-group/notebooker

The notebook can be executed with custom parameters. Please take notice that there is an option to convert the notebook to PDF and send it by email:

Notebooker parametrize notebook

Credits: Image from https://github.com/man-group/notebooker

The resulting notebook is displayed in the browser. There is a toggle button to show/hide code.

Notebooker executed notebook

Credits: Image from https://github.com/man-group/notebooker

The Notebooker solution looks very promising. You can run it locally or in the cloud with the docker-compose command. The notebook sharing can be done with a link or email. There is an option to convert executed notebook to PDF.

The biggest drawback is missing user management. You can't restrict access to the notebook. The user with the link to the Notebooker has access to all notebooks. There is no option to export executed notebook as slides with Reveal.js.

The next concern is about the license. It is AGPL-v3. You can make changes to the Notebooker code base, but you are forced to keep them open-source (with the same license). That might be a blocker for small and medium-sized businesses that want a private custom solution. What about support? The large investment company created the Notebooker open-source software, the contributors can help you solve your problems, but they don't have to (you don't have any commercial contract with them). Anyway, this solution looks interesting, and for sure, it is worth checking.

4. Mercury

The Mercury is an open-source framework for converting Jupyter Notebooks into web applications. It has the option to schedule Jupyter Notebook. The software is written in React and Django with TypeScript and Python languages.

GitHub repository Mercury

The Mercury converts Jupyter Notebook to a web app with a YAML header added as the first raw cell of the notebook.

Convert Jupyter Notebook to web application

There is available a schedule parameter in the YAML configuration. It accepts a crontab string. The example YAML header that schedules notebook:

---
title: Financial report
description: Stock financial report
schedule: '0 9 * * 1-5'
notify:
    on_success: contact@mljar.com
    attachment: pdf
show-code: False
params:
    ticker:
        input: select
        label: Select a ticker
        value: TSLA
        choices: [TSLA, TWTR, MSFT, SNOW, PLTR, NFLX]
    period:
        input: select
        label: Select period
        value: 3mo
        choices: [1mo, 2mo, 3mo, 6mo, 12mo, 24mo]
---

The scheduling part of the header:

schedule: '0 9 * * 1-5'
notify:
    on_success: contact@mljar.com
    attachment: pdf
show-code: False
  • the schedule parameter is set for executing the notebook every day at 9:00 AM from Monday to Friday,
  • the notify parameter sets a list of emails that will receive a PDF notebook after successful execution,
  • the show-code: False means that the code will be hidden in a final notebook.

The YAML header is parametrized. There are two parameters available period and ticker. They have default values, however user has an option to select different values for them and execute a notebook with new values. Example custom execution is presented below:

Execute Parametrized Notebook in Mercury

The example email with executed notebook attached as PDF file:

Execute Parametrized Notebook in Mercury

You can share multiple notebooks with Mercury framework. There is a card available for each notebook with a small preview of the first cells:

Share multiple notebooks in Mercury

There is user management available in the Mercury. You can add as many users as you want. Restricting the users' access to the notebook is as simple as adding one line in the YAML with a share parameter. There is an option to share a notebook with selected users, groups, or emails list. You can read more in the documentation.

Authenticate User to restrict notebook access in Mercury

The Mercury can convert Jupyter Notebook into slides with Reveal.js. You can run it locally or in the cloud with the docker-compose command.

The Mercury is dual licensed. The open-source version is available under the AGPL-v3 license. The additional features (authentication), dedicated support, and private forks are provided with a commercial license. You can read more on the Mercury pricing website. It might be a great choice for small and medium-sized businesses looking for a Jupyter Notebook sharing solution.

5. Web-based Notebook Services

Schedule Notebooks in the Cloud

If you don't want to install any software locally or in the cloud, then you can use one of online-services that offer Notebook-as-a-Service. The list of a few popular Jupyter Notebook online services:

Those services come with Python environments fully packed with many Data Science and Machine Learning packages. You don't need to install anything; just write Python code.

However, not all of them support scheduling. The documentation links for services with scheduling features:

You need to check which service suit your needs. They offer a different set of features. If you would like to share scheduled notebooks with other users, then they should have online accounts created. Online Notebook service might be a good solution for you if:

  • you have no restrictions for storing your data and code in the external cloud services,
  • you don't want to manage the remote server by yourself,
  • your company can afford a monthly subscription for you and your teammates.

Summary

Automating Jupyter Notebook scheduling might be a great time saver. The bare-bone solution for scheduling applications like cron and Windows Task Scheduler might be a good approach for tech-savvy users that can implement all needed functionality in a few lines of bash or Powershell script. There are open-source solutions, Notebooker and Mercury, created for scheduling and sharing notebooks. The Mercury offers commercial license with dedicated support, additional features, and private forks. It might be an excellent solution for small and medium-sized companies looking for a Jupyter Notebook scheduling solution with customization possibilities. The cloud-based notebook scheduling services are a good offer for companies that can store data and code in the external cloud services and afford user-based pricing.

Become a Data Science wizard, today!

Forget about Python problems, just do your work.

MLJAR Studio