JupyterLab with Apptainer

Running JupyterLab on a Compute-Cluster

HPC
Python
Published

May 20, 2024

Modified

June 5, 2024

Abstract

JupyterLab is a web-based interactive environment for working with code, and data. It is widely used in scientific computing and is suited to be executed on a compute-cluster. This article illustrates how to add JupyterLab to an Apptainer container image, and how to start the a container instance on HPC infrastucture.

Why Use JupyterLab?

JupyterLab 1 is an excellent choice for scientific computing due to its flexibility, customization options, and ability to support a wide range of programming languages, libraries, and tools. Here are some reasons why you might choose to use JupyterLab and Jupyter Notebooks 2 in scientific computing:

  1. Ease of Use: JupyterLab is designed to be user-friendly, with features like auto-completion, syntax highlighting, and debugging capabilities that can help reduce the learning curve for new users.
  2. Reproducibility: Jupyter Notebooks (.ipynb files) are self-contained documents that include code, output, and visualizations, making it easy to reproduce results and share them with others.
  3. Easy Data Exploration: You can load and visualize various data formats, such as CSV, Excel, JSON, and more, using popular libraries like Pandas, NumPy, and Matplotlib.
  4. Interactive Visualizations: Jupyter notebooks integrate with popular visualization libraries like Plotly, Bokeh, and Altair, enabling you to create interactive, web-based visualizations of your data.
  5. Flexibility: Support of a wide range of programming languages, including Python, R, Julia, and MATLAB, allowing you to use the language that best suits your needs.
  6. Version Control: You can use version control systems like Git to manage changes to your notebooks and collaborate with others.
  7. Extensions for Specific Domains: There are many extensions available for JupyterLab that can help with specific tasks in scientific computing.

Jupyter Containers

The Jupyter team maintains a set of Jupyter Docker Stack 3 for typical use-cases with corresponding public container images on DockerHub 4 and Quay 5. These images can be use on the computer cluster as well:

Login to a cluster submit node and start a JupyterLab container:

apptainer run docker://jupyter/base-notebook

The JupyterLab service daemon will print log information to the Terminal, including the connection address the so called access token described in the next section.

Stop JupyterLab by pressing CTRL-C once you have finished working.

Access Token

stdout
# Information from the logs of the JupyterLab service
Or copy and paste one of these URLs:
    http://lxbk0725:12345/lab?token=598fb99446ce46996e9a74ea28bea3d2c7fd2591be4431f6
    http://127.0.0.1:12345/lab?token=598fb99446ce46996e9a74ea28bea3d2c7fd2591be4431f6

A Jupyter Access Token 6 like the example above is a secure, randomly generated string that authenticates and authorizes users to access JupyterLab notebooks and resources, providing an additional layer of security and control over notebook sharing and collaboration.

The access token includes the host name and port number required by the user to connect.

In the simplest case you can use the HTTP address which includes the host name of the executing node (lxbk0725 in the example above) to connect to JupyterLab with your web browser. In case you accessed the cluster from an external network you will need to setup SSH port forwarding as explained in the next section.

Remote Access

Access to JupyterLab from outside networks requires an additional step. SSH Port Forwarding is a technique that creates a secure, encrypted tunnel between your local machine and a remote server, allowing you to access the remote server’s services or applications as if they were running locally by forwarding specific ports from the remote server to your local machine.

Use ssh to configure port forwarding to JupyterLab running on the cluster:

ssh -vv -N -L localhost:12345:lxbk0725.gsi.de:12345 virgo.hpc.gsi.de
# …logs will be print to the terminal …use ctrl-c to close

Once port forwarding has been started use the HTTP address with the IP address 127.0.0.1 (localhost) to connect to JupyterLab.

Build a Container

The following section implies that you want to add JupyterLab to an existing container. Alternatively you could use the Jupyter Docker Stack as foundation and build from by adding your components on top.

To integrate JupyterLab into your Apptainer container, you’ll typically extend an existing configuration that includes your working environment. Following assumes you have Python 3 available in our container and uses a Python virtual environment to install JupyterLab 7.

Adding JupyterLab to an Apptainer Container Definition File 8:

apptainer.def
%post
# …add in an appropriate place in the post-section
mkdir /app
cd /app
python3 -m venv venv
. venv/bin/activate
# install JupyterLab including some extensions
pip3 install \
        jupyter \
        jupyterlab-spellchecker \
        jupyterlab-git \
        jupyterlab-lsp \
        'python-lsp-server[all]'

%environment
export JUPYTERLAB_PORT=54321

%runscript
. /app/venv/bin/activate
exec jupyter lab --no-browser --ip 0.0.0.0 --port $JUPYTERLAB_PORT

%startscript
. /app/venv/bin/activate
exec jupyter lab --no-browser --ip 0.0.0.0 --port $JUPYTERLAB_PORT

The example above prepares the use of apptainer instance 9 to manage your JupyterLab installation:

Apptainer, also allows you to run containers in a “detached” or “daemon” mode where the container runs a service. A “service” is essentially a process running in the background…

The example above enables the use of an environment variable:

Variable Description
JUPYTERLAB_PORT This defines the default port used when a JupyterLab instance is started. The configuration above sets the port to 54321. Note that in case you share a host with other users you may be required to adjust the port number to avoid collisions.

Build the container and copy the image to shared storage:

1apptainer build apptainer.sif apptainer.def

# Make sure to store the container image on persistent storage for example Lustre
export LUSTRE_HOME=/lustre/$(id -ng)/$USER
export APPTAINER_CONTAINERS=$LUSTRE_HOME/containers
2cp apptainer.sif $APPTAINER_CONTAINERS/jupyterlab.sif
1
Build the Apptainer container image from the definition file 10
2
Copy the JupyterLab container images to your Lustre directory

Start the container instance on the current host:

1export APPTAINERENV_JUPYTERLAB_PORT=12345
2apptainer run $APPTAINER_CONTAINERS/jupyterlab.sif
1
Overwrite the default port configuration with an environment variable
2
Start your JupyterLab instance on the local node

Start a JupyterLab instance on a cluster node:

1export LUSTRE_HOME=/lustre/$(id -ng)/$USER ; cd $LUSTRE_HOME
2srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i

# …once your allocation has been granted
3export APPTAINER_CONFIGDIR=$LUSTRE_HOME/.apptainer
4apptainer instance start $LUSTRE_HOME/jupyterlab.sif jupyterlab
1
Configure your working directory, typically on shared storage
2
Allocate an interactive session on a compute node with srun
3
Make sure the Apptainer configuration path is located on writable storage
4
Start your JupyterLab instance on the compute node

Read the access token from the log-files of your JupyterLab instance:

>>> cat $APPTAINER_CONFIGDIR/instances/logs/$(hostname)/$USER/jupyterlab.err \
        | grep -o 'http.*lab?token.*' | sort | uniq
http://127.0.0.1:54321/lab?token=041ded5856f8bfe5202c5118027016f990390729ff7d0433
http://lxbk0724:54321/lab?token=041ded5856f8bfe5202c5118027016f990390729ff7d0433