Bare metal

JupyterHub can be deployed in bare metal systems. This is the way it was done by the Gravity Exploration Institute at Cardiff University to make a set of Python environments (3.6, 3.7 and 3.8) available to its users.

The main difference with a traditional JupyterHub installation is the use of conda to install a full environment rather than the recommended combination of pip/conda.

The main steps are:

  1. Install JupyterHub dependencies:

    • Nodejs

    • npm

    • systemd

    • httpd

    • miniconda

  2. Create JupyterHub group for sudospawner

  3. Create JupyterHub user

  4. Create a sudoers file

  5. Create a configuration file

  6. Create an anaconda virtual environment file for JupyterHub.

  7. Create a sudospawner configuration file

  8. Create JupyterHub’s systemd service file

  9. Create JupyterHub’s httpd configuration file

  10. Create JupyterHub’s static kernels

  11. Provision the kernels with the required environments

  12. Create a suitable script to start JupyterHub

  13. Optionally create a suitable logo to display in JupyterHub

  14. Start the JupyterHub service

There is an ansible script available to try to automatize this process.

Installing Anaconda

The first step is making sure that conda>4.8.3 is available in the system or download and install otherwise:

miniconda/tasks/main.yml:

---
- name: check for existing miniconda
  stat:
    path: "{{ miniconda_conda_bin }}"
  changed_when: false
  register: miniconda_conda

- name: get installed miniconda version
  command: "{{ miniconda_conda_bin }} --version"
  changed_when: false
  register: installed_conda_version
  when: miniconda_conda.stat.exists

- name: check installed miniconda version
  set_fact:
    installed_conda_version: "{{ installed_conda_version.stdout | regex_search(version_output, '\\1') | first }}"
  vars:
    version_output: 'conda (.+)'
  when: installed_conda_version.stdout is defined

# install miniconda
- when: not miniconda_conda.stat.exists or installed_conda_version < miniconda_min_version
  block:
    - name: download miniconda installer
      get_url:
        url: "{{ miniconda_installer_url }}"
        dest: "/tmp/{{ miniconda_installer_url | basename }}"
        mode: "0755"

    - name: install miniconda
      command: "bash /tmp/{{ miniconda_installer_url | basename }} -b -p {{ miniconda_prefix }}"
      become: true
      args:
        creates: "{{ miniconda_prefix }}"

    - name: delete miniconda installer
      file:
        path: "/tmp/{{ miniconda_installer_url | basename }}"
        state: absent

Where the different variables are defined in defaults and vars:

miniconda/defaults/main.yml

miniconda_prefix: /opt/miniconda3
miniconda_min_version: "4.8.3"
miniconda_installer_url: https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

miniconda/vars/main.yml

---
miniconda_conda_bin: "{{ miniconda_prefix }}/condabin/conda"

Installing other dependencies

Depending on the Linux distribution used, the way to install the rest of dependencies may vary. This guide has been tested in Centos 7.

In this step we install (in case they are not already available):

  • nodejs

  • npm

  • systemd

  • httpd

In the case of nodejs, it might be necessary to first install Node repository. We can do this by downloading the installers and running them locally. We can use the following ansible role to perform the installation:

node/tasks/main.yml

---
- name: download nodejs repo installer
  get_url:
    url: "https://rpm.nodesource.com/setup_15.x"
    dest: "/tmp/nodejsrepo"
    mode: "0755"

- name: Install nodejs repository
  command: "bash /tmp/nodejsrepo"
  become: true

# npm is installed as part of nodejs
- name: install jupyterhub deps
  yum:
    name:
      - nodejs
      - systemd
      - httpd
    state: installed
  become: true

Authentication

For demonstration purposes we show how to configure via PAM (Ligo uses Shibboleth for authentication purposes). The authentication method to use can be defined in JupyterHub configuration file:

jupyterhub/templates/jupyterhub_config.py.j2

c.Application.log_level = 10
# By commenting out the authenticator class 
# jupyterhub falls back to using PAM
#c.JupyterHub.authenticator_class = 'jhub_remote_user_authenticator.remote_user_auth.RemoteUserLocalAuthenticator'
c.ConfigurableHTTPProxy.command = '/opt/jupyterhub/bin/configurable-http-proxy'
c.JupyterHub.spawner_class = 'sudospawner.SudoSpawner'
c.SudoSpawner.sudospawner_path = '/opt/jupyterhub/bin/sudospawner'
# Make JupyterLab the default
c.Spawner.default_url = '/lab'
c.PAMAuthenticator.open_sessions=False

As mentioned, by commenting out the authenticator class JupyterHub falls back to using PAM authentication method.

IRIS IAM

It is possible to authenticate using IRIS IAM. For this some changes and additional packages are required.

  • Dependencies: OAuth should be installed using pip. This can be added to the original conda environment definition:

    jupyterhub/files/jupyterhub-environment-iris-iam.yml

    name: jupyterhub
    channels:
      - conda-forge
    dependencies:
      - configurable-http-proxy
      - jupyterlab
      - pip
      - python=3.8
      - sudospawner
      - pip:
        - jhub_remote_user_authenticator
        - "--editable=git+https://github.com/jupyterhub/oauthenticator.git@master"
    
  • Client registration: JupyterHub needs to be registered as a IAM client on IRIS IAM

    _images/iris-iam-dashboard.png

    From here follow the instructions in the INDIGO IAM documentation site. You will need to enter the public IP address and port of the server running JupyterHub in the Redirect URI(s) field.

    Make sure to save the client credentials for your client as they will allow you to modify its settings later on and configure JupyterHub.

  • JupyterHub configuration: The GenericOAuthenticator method is used to interact with IRIS IAM and requires configuring a few settings including the client id and client secret provided in the previous step. We also need to provide the address to which the user will be redirected after successful authentication (this address needs match to one defined during the client registration step).

    IAM authentication requires defining a few environment variables and make them visible by JupyterHub. In our current setup the JupyterHub user, jupyterhub, is a nologin non-interactive user and one way to define these environment variables is to add them in the JupyterHub configuration file using the os module.

    Another thing to keep in mind is that our current setup requires the authenticated user to have a matching account in the system running JupyterHub. This is, if user “John” authenticates in IRIS IAM, JupyterHub’s spawner (sudospawner in our current case) will try to start a Jupyter server for user “John” in the local system failing if the user cannot be found.

    It is further possible to filter users by their IAM group by using the allowed_groups option. For example, we can specify that only users part of the jupyterhub-da/stfccloud group to have access to our hub. NOTE Please note that allowed_groups, together with other options described in OAuthenticator website, are not yet supported in the latest (0.13.0 as of yet) OAuthenticator release, so we need to use the master branch in GitHub, this might change in future releases.

    c.Application.log_level = 10
    c.JupyterHub.spawner_class = 'sudospawner.SudoSpawner'
    c.SudoSpawner.sudospawner_path = '/opt/jupyterhub/bin/sudospawner'
    # Make JupyterLab the default
    c.Spawner.default_url = '/lab'
    c.ConfigurableHTTPProxy.command = '/opt/jupyterhub/bin/configurable-http-proxy'
    
    # Authenticator
    import os
    import subprocess
    import sys
    os.environ['OAUTH2_AUTHORIZE_URL']='https://iris-iam.stfc.ac.uk/authorize'
    os.environ['OAUTH_CALLBACK_URL']='http://<JH-IP>:<PORT>/hub/oauth_callback'
    os.environ['OAUTH2_TOKEN_URL']='https://iris-iam.stfc.ac.uk/token'
    
    from oauthenticator.generic import GenericOAuthenticator
    c.JupyterHub.authenticator_class = GenericOAuthenticator
    c.GenericOAuthenticator.login_service = 'IRIS IAM'
    c.GenericOAuthenticator.client_id = '<COPY_FROM_IRIS_IAM_CLIENT_REGISTRATION>'
    c.GenericOAuthenticator.client_secret = '<COPY_FROM_IRIS_IAM_CLIENT_REGISTRATION>'
    c.GenericOAuthenticator.userdata_url = 'https://iris-iam.stfc.ac.uk/userinfo'
    c.GenericOAuthenticator.token_url = 'https://iris-iam.stfc.ac.uk/token'
    c.GenericOAuthenticator.userdata_method= 'GET'
    c.GenericOAuthenticator.userdata_params: {'state': 'state'}
    c.GenericOAuthenticator.username_key = 'preferred_username'
    c.GenericOAuthenticator.oauth_callback_url = 'http://<JH-IP>:<PORT>/hub/oauth_callback'
    c.GenericOAuthenticator.allowed_groups = ['jupyterhub-da/stfccloud']
    

After configuration, the user would navigate to the JupyterHub’s server address and be greeted by a message like:

_images/iris-iam-jh-sign-in.png

And then the user should be redirected to IRIS IAM login website:

_images/iris-iam-sign-in.png

Our client needs to be approved by the user the first time it is used by that user. After authorization the user should be redirected to the Jupyter server spawned by the Hub.

_images/iris-iam-approval.png

Spawner

Ligo uses a Custom Spawners for JupyterHub (SudoSpawner) to start each single-user notebook server. This spawner enables JupyterHub to run without being root, by spawning an intermediate process via sudo. This seems like a sensible choice to improve system security. In JupyterHub configuration file this is controlled with sudospawner_path. Besides this, SudoSpawner requires setting up the user that will actually run the Hub and define which commands is it allowed to execute on behalf of users. This is done via a couple of configuration files:

A systemd configuration file for JupyterHub that defines the right user and location where jupyterhub command should be invoked:

jupyterhub/templates/jupyterhub.service.j2

[Unit]
Description=JupyterHub
Requires=firewalld.service
After=network-online.target

[Service]
User={{ jupyter_server_user }}
Environment="PATH={{ jupyterhub_prefix }}/bin:/sbin:/bin:/usr/sbin:/usr/bin"
ExecStart={{ jupyterhub_prefix }}/bin/jupyterhub
WorkingDirectory={{ jupyterhub_config_directory }}
Restart=on-failure

[Install]
WantedBy=multi-user.target

And a sudoers file that defines the command that JupyterHub’s user is allowed to execute. Users are allowed to spawn a Jupyter Notebook if they are member of a particular group (LIGO in Ligo’s case):

jupyterhub/templates/jupyterhub.sudofile.j2

# the command(s) the Hub can run on behalf of the above users without needing a 
# password the exact path may differ, depending on how sudospawner was installed
Cmnd_Alias JUPYTER_CMD = {{ jupyterhub_prefix }}/bin/sudospawner

# actually give the Hub user permission to run the above command on behalf
# of the above users without prompting for a password
{{ jupyter_server_user }} ALL=(%{{ jupyter_server_sudo_group }}) NOPASSWD:JUPYTER_CMD

Defaults

It is useful to define default values for some of the parameters used in our configuration files. Being stored in a separate file might facilitate to adapt these templates for different cases.

jupyterhub/defaults/main.yml

---
igwn_conda_root: "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda"
igwn_env_name:
  - igwn-py36
  - igwn-py36-proposed
  - igwn-py36-testing
  - igwn-py37
  - igwn-py37-proposed
  - igwn-py37-testing
  - igwn-py38
  - igwn-py38-proposed
  - igwn-py38-testing
jupyter_server_user: jupyterhub
jupyter_server_sudo_group: LIGO
jupyterhub_config_directory: "/etc/jupyterhub"
jupyterhub_log_directory: "/var/log/jupyterhub"
jupyterhub_datadir: "/usr/local/share/jupyter"
jupyterhub_prefix: "/opt/jupyterhub"
...

Main tasks

This script runs the main tasks required to deploy our JupyterHub service.

jupyterhub/defaults/main.yml

---
# CREATE CONDA ENVIRONMENT
- name: create temporary directory
  tempfile:
    state: directory
    suffix: jupyterhub
  register: jupyterhub_tempdir

- name: copy jupyterhub environment file
  copy:
    src: jupyterhub-environment.yml
    dest: "{{ jupyterhub_tempdir.path }}/environment.yml"
    owner: root
    group: root
    mode: "0644"
  register: jupyterhub_environment_yaml

- name: check jupyterhub environment exists
  stat:
    path: "{{ jupyterhub_prefix }}"
  register: jupyterhub_environment

- name: create jupyterhub environment
  command: "{{ miniconda_conda_bin }} env create
    --file {{ jupyterhub_environment_yaml.dest }}
    --prefix {{ jupyterhub_prefix }}
    --quiet"
  when: not jupyterhub_environment.stat.exists

- name: update jupyterhub environment
  command: "{{ miniconda_conda_bin }} env update
    --file {{ jupyterhub_environment_yaml.dest }}
    --prefix {{ jupyterhub_prefix }}
    --quiet"
  when: jupyterhub_environment.stat.exists

- name: delete tempdir
  file:
    path: "{{ jupyterhub_tempdir.path }}"
    state: absent
# END CREATE CONDA ENVIRONMENT

# SUDOSPAWNER SETUP
# seems like this is not really needed because sudospawner is installed as part
# of conda and everybody as permissions to execute it??
- name: create jupyterhub group for sudospawner
  group:
    name: "{{ jupyter_server_sudo_group }}"
    state: present

- name: create jupyterhub user
  user:
    name: "{{ jupyter_server_user }}"
    comment: "jupyterhub server user"
    system: "yes"
    state: present
    createhome: "no"
    shell: /sbin/nologin
    groups: "{{ jupyter_server_sudo_group }}"

- name: copy jupyterhub sudoers file
  template:
    src: jupyterhub.sudofile.j2
    dest: /etc/sudoers.d/jupyterhub
    owner: root
    group: root
    mode: "0644"
    validate: /usr/sbin/visudo -cf %s
- name: copy Jupyter sudospawner config file over
  template:
    src: sudospawner-singleuser.j2
    dest: "{{ jupyterhub_prefix }}/bin/sudospawner-singleuser"
    owner: root
    group: root
    mode: "0755"
## END SUDOSPAWNER SETUP

## JHUB CONFIGURATION
- name: create jupyterhub config directory
  file:
    path: "{{ jupyterhub_config_directory }}"
    state: directory
    owner: root
    group: "{{ jupyter_server_sudo_group }}"
    mode: "0775"

- name: copy jupyterhub config file over
  template:
    src: jupyterhub_config.py.j2
    dest: "{{ jupyterhub_config_directory }}/jupyterhub_config.py"
    owner: "{{ jupyter_server_user }}"
    group: root
    mode: "0644"

- name: copy systemd service file over
  template:
    src: jupyterhub.service.j2
    dest: /etc/systemd/system/jupyterhub.service
    owner: root
    group: root
    mode: "0644"
## END JHUB CONFIGURATION

- name: copy static jupyter kernels
  copy:
    src: kernels
    dest: "{{ jupyterhub_datadir }}"
    owner: root
    group: root
    mode: preserve

- name: provision IGWN jupyter kernels
  file:
    path: "{{ jupyterhub_datadir }}/kernels/{{ item }}"
    state: directory
    owner: root
    group: root
    mode: "0755"
  with_items:
    - "{{ igwn_env_name }}"

- name: create IGWN jupyter kernel.json
  template:
    src: kernel.json.j2
    dest: "{{ jupyterhub_datadir }}/kernels/{{ item }}/kernel.json"
  with_items:
    - "{{ igwn_env_name }}"

- name: create IGWN jupyter start.sh
  template:
    src: start.sh.j2
    dest: "{{ jupyterhub_datadir }}/kernels/{{ item }}/start.sh"
  with_items:
    - "{{ igwn_env_name }}"

- name: create IGWN jupyter logo
  copy:
    src: igwn-logo-64x64.png
    dest: "{{ jupyterhub_datadir }}/kernels/{{ item }}/logo-64x64.png"
  with_items:
    - "{{ igwn_env_name }}"

- name: start jupyterhub service
  service:
    name: jupyterhub
    state: started
    enabled: yes

Running deployment

The files above can be run with an ansible playbook

jhub.yml

#- hosts: all
- hosts: 127.0.0.1
  roles:
   - miniconda
   - node
   - { role: jupyterhub, become: yes }