Installation guide for setting up JupyterHub on Azure
The following are instructions for Linux users who want to setup and run JupyterHub on Microsoft Azure cloud services. Most of the instructions will translate well to Unix/Mac users, only a few are Linux specific. There is more than one way to install JupyterHub and the following instructions demonstrate a preference for a command line interface. Prerequisites include the installation of `az`, `kubectl`, and `helm` on a local machine.
JupyterHub’s architecture runs on Linux/Unix OS including a ready to go docker image. Installation on Windows OS is not supported. Deployment on cloud services leverages the container orchestration software Kubernetes, minimizing dependencies on a specific cloud service provider and improving portability. Helm is a package manager for Kubernetes which manages installation and updating of Kubernetes applications in coordination with the Tiller service.
Instructions are grouped in three parts, each with it’s own ‘Quick Start’ section. Further details and explanations are provided after each Quick Start section for those wanting more than an amalgamation of sequential commands. The Quick Start sections and subsequent instructions make assumptions about naming directories, namespaces, accounts and releases.
Prerequisites
1. Azure Cloud Pay-As-You-Go Account
Microsoft offers a free-trial account https://azure.microsoft.com/en-us/free/ but JupyterHub installation using a Free trial subscription is bound to fail so long as the VM resources required by JupyterHub exceed the limitations of the subscription (Fig 1). At the time of writing Free Trial subscriptions are limited to VMs with four cores and are not eligible for limit or quota increases.
VMs with six cores or more are required for JupyterHub installation and the Free Trial subscription prevents the installation of JupyterHub. Pay-As-You-Go Accounts can be accessed at the following URL: https://azure.microsoft.com/en-gb/pricing/purchase-options/pay-as-you-go/
2. Local dependencies
Managing the cluster can occur on a local machine. Installing the Azure CLI locally makes it easier to manage things and bypasses the requirement to go to a website every time. Install the following local dependencies `az`, `kubectl`, and `helm` by copying and pasting the commands in a terminal.
Azure Cli (`az`)
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
Helm (`helm`)
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bashKubernetes Cluster Manager (`kubectl`)
Kubernetes Cluster Manager (`kubectl`)
snap install kubectl
OR
sudo apt-get update && sudo apt-get install -y apt-transport-https curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list sudo apt-get update sudo apt-get install -y kubectl
OR
(mac only) -> brew install kubectl
Part 1: Kubernetes Cluster Setup
Quick Start
If you want to skip the explanations given in Part 1, Sections 1-8, all of the relevant commands are listed in sequence for your convenience. Assumptions are made to name the local directory and cluster ‘az-jupyterhub’, the resource group ‘comp689_jupyter_hub’ and declare the location of the data center in ‘Canada Central’.
DESCRIPTION | COMMAND |
Login to Azure |
az login |
Verify subscription |
az account list --refresh --output table |
Create Resource Group |
az group create --name=comp689_jupyter_hub --location="Canada Central" --output=table |
Make local directory |
mkdir az-jupyterhub && cd az-jupyterhub |
Generate key pair |
ssh-keygen -f ssh-key-az-jupyterhub |
Create Kubernetes Cluster |
az aks create --name az-jupyterhub --resource-group comp689_jupyter_hub --ssh-key-value ssh-key-az-jupyterhub.pub --node-count 3 --node-vm-size Standard_D2s_v3 --output table |
Download Credentials |
az aks get-credentials --name az-jupyterhub --resource-group comp689_jupyter_hub --output table |
Verify Cluster |
kubectl get node |
1. Login to Azure
Once the Azure command `az` is installed locally, the following will prompt you to login to the service through a browser interface. After completing this login step, will you be able to install a Kubernetes cluster and communicate with the Azure portal.
az login
If this is a new account, you may be prompted that you “have no storage mounted” and that the shell feature requires an Azure file share to persist files. This is so that login credentials are saved. Further prompts may inform you that creating a storage account will incur a small monthly cost. This is a requirement in order to continue.
2. Verify Azure Subscription
Since you can scale up and manage many subscriptions from the command line, it is important to associate the following commands with the right account. If this is the first and only subscription to Azure that you have the following command will list only one subscription:
az account list --refresh --output table
3. Create a Resource Group
A mechanism by which computational resources on a cloud service can be allocated to one application and distinguished from other applications is expressed in Azure as a resource group which requires a unique name and a location. The location of the data centre chosen in the following example is ‘Central Canada’ and the unique name ‘comp689_jupyter_hub’.
az group create --name=comp689_jupyter_hub --location="Canada Central" --output=table
4. Cluster Name
Some files need to be kept on your local machine. Choose a name for your cluster and create a local directory with the same name. This name should also be used, in part, to identify the ssh key pair associated with the cluster. In the example below, ‘az-jupyterhub’ was chosen.
mkdir az-jupyterhub
cd az-jupyterhub
5. Authorization
Authentication between your local machine and the Kubernetes cluster is facilitated by a public/private ssh key. Interacting with and configuring your cluster will rely on the files created in this next step. Run the following command which generates a public/private key pair with a similar name ‘ssh-key-az-jupyterhub’.
ssh-keygen -f ssh-key-az-jupyterhub
Note: You can replace the name ‘az-jupyterhub’ with your own name for your cluster.
6. Create an Azure Kubernetes Cluster
A request can now be made to create a Kubernetes cluster with the following details: Authentication (‘ssh-key-az-jupyterhub.pub’), resource group (‘comp689_jupyter_hub’), and a cluster name (‘az-jupyterhub’).
az aks create --name az-jupyterhub \ --resource-group comp689_jupyter_hub \ --ssh-key-value ssh-key-az-jupyterhub.pub \ --node-count 3 \ --node-vm-size Standard_D2s_v3 \ --output table
7. Download Kubernetes Credentials
If the cluster creation is successful, configuration details will also be created (tokens, certificates, etc) which links your account to the cluster. Download these credentials to your local machine with the following command:
az aks get-credentials \ --name az-jupyterhub \ --resource-group comp689_jupyter_hub \ --output table
This will allow you to interact with your newly created Kubernetes cluster using the `kubectl` command on your local machine.
8. Check Kubernetes Cluster Functionality
If successful, the following command should list three running nodes:
kubectl get node
Part 2: Helm / Tiller setup
Quick Start
If you want to skip the explanations given in Part 2, Sections 1-3, all of the relevant commands are listed for your convenience:
DESCRIPTION | COMMAND |
Tiller setup |
kubectl --namespace kube-system create serviceaccount tiller |
Tiller permissions |
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller |
Helm/Tiller start |
helm init --service-account tiller –wait |
Secure Tiller |
kubectl patch deployment tiller-deploy --namespace=kube-system --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]' |
Verify Helm |
helm version |
As a package manager for Kubernetes applications, Helm and Tiller work together to describe and deploy resources within a cluster. Tiller acts as a service on the cloud which interacts with the cluster. Helm is the client for that service. Helm charts describe deployment instructions that are sent to the Tiller service which then interacts with the Kubernetes cluster.
1. Tiller
Setup a service account for Tiller with the following command:
kubectl --namespace kube-system create serviceaccount tiller
Give that service account permission to manage the Kubernetes cluster:
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
2. Helm
The following command will setup the Helm client locally and start the Tiller service in the cluster.
helm init --service-account tiller –wait
It only has to be run once and then future changes can be deployed with the Helm client which will tell Tiller what instructions to execute within the cluster.
3. Secure Tiller and Verify Helm
Since Tiller service runs inside the cluster and has elevated permissions to control the cluster it is necessary to configure it so that it only listens to commands from localhost and not within the cluster. Leaving the port that Tiller uses open for probing would allow pods in the cluster to exploit Tiller’s elevated permissions. Secure Tiller with the following command:
kubectl patch deployment tiller-deploy --namespace=kube-system --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'
Then you can verify (Fig.4) that Helm and Tiller are installed properly by ensuring that Helm and Tiller versions are matching:
helm version
Part 3: JupyterHub Setup
Quick Start
If you want to skip the explanations given in Part 3, Sections 1-5, all of the relevant commands are listed for your convenience. Assumptions a made to name the namespace and release labels ‘jhub’.
DESCRIPTION | COMMAND |
Create config file |
{ echo proxy:; echo secretToken:\"$(openssl rand -hex 32)\"; } | tee config.yaml |
Add Helm repo |
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/ |
Update repo |
helm update repo |
Install JupyterHub |
helm upgrade --install jhub jupyterhub/jupyterhub --namespace jhub --version=0.8.2 --values config.yaml |
Validate Installation |
kubectl get pod –namespace jhub |
Get External IP |
kubectl get service –namespace jhub |
1. Config file
Working from the local directory created earlier, the same directory with the ssh keys, a security token needs to be generated and then added to a `config.yaml` file. The following will generate a random string:
openssl rand -hex 32
Copy and paste the random string generated by the previous command into the `secretToken` field in the `config.yaml` file, formatted in the following way:
proxy: secretToken:<random_hex_value_here>
Note: you can combine both steps above with:
{ echo proxy:; echo secretToken:\"$(openssl rand -hex 32)\"; } | tee config.yaml
2. Helm repo
Next, make the Helm client aware of the JupyterHub Helm chart repository so that it knows where to find the latest Helm charts created by JupyterHub. The second command will update the repository.
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
3. Install JupyterHub
Now you’re ready to install JupyterHub! There are two variables, RELEASE and NAMESPACE that can be given the same values; `jhub` is used for both in the following example.
helm upgrade --install jhub jupyterhub/jupyterhub \ --namespace jhub \ --version=0.8.2 \ --values config.yaml
4. Validation
Make sure that the pods are in the Running state:
kubectl get pod –namespace jhub
5. External Access
Get the external IP to access JupyterHub from a browser:
kubectl get service –namespace jhub
Enter the IP in a browser (Fig. 7):