JupyterLab sandbox in an Oracle Cloud Compute Instance

Few weeks ago I was writing about my own sandbox setup, which I run at home. But hey, we are in 2019! Why not run it in the cloud? The main reason was I didn’t need much power (CPU or memory) for my sandboxes and therefore my small host at home was more than enough.

No need to say that when I started working on Machine Learning on a graph of the whole Wikipedia site I quickly was out of resources (it only needed 200Gb of RAM…).
I then started setting up a sandbox in Oracle Cloud, to have the flexibility to have more resources than the limited hardware I have at home, and also to have an easier integration with other cloud services, like the database, without going into security and firewall configuration.

JupyterLab notebook on Oracle Cloud

As my previous post says, I’m fully into JupyterLab as my sandbox environment for both machine learning and property graphs activities. Some Oracle Cloud services come with Apache Zeppelin (like the ML notebooks with the Autonomous Database), but it isn’t really customizable or extensible.

This is why I stick to JupyterLab, not changing my habits and having full control on the environment and the various kernels I run.
There isn’t an Oracle Cloud service providing it directly, therefore I simply setup a Compute Instance (a virtual machine in the cloud) which I configure by installing Python 3.6 first, JupyterLab after and all the extra things I could need later.

First thing: setup a Virtual Cloud Network

One could directly start by creating a Compute Instance and the “cloud” will take care to create all the extra bits needed.
But doing things from the beginning first I create a Virtual Cloud Network, which I will be able to configure and name as I want instead of having a random generated name and default settings everywhere.

If you used virtual machines already, either VirtualBox or VMware or others, you are probably already used to have to deal with networking. Defining the virtual networks connected to the VMs depending on how isolate or reachable from outside it must be etc. In the cloud it isn’t much different.

A Virtual Cloud Network will be used by any Oracle Cloud service based on OCI. More and more services are like that and even Analytics (OAC) which wasn’t is being ported to OCI (deployment in progress).

From the main menu select “Networking” > “Virtual Cloud Networks” in the Core Infrastructure section.

And directly to “Create Virtual Cloud Network”, to add a new one.

Pick a name which make sense for you for your network, the name is also used (by default) for the DNS domain name.

The CIDR BLOCK is a parameter which can either set as the example to 10.0.0.0/16 (giving you up to 65’536 addresses available, probably more than enough) or set to something different. You have to keep a private network, so 10.x.x.x or 192.168.x.x or 172.16-31.x.x. If you have a whole network architecture in mind you can tailor it based on those needs, for a majority a classical 10.0.0.0/16 will be more than enough.

I also enable the DNS RESOLUTION as some other services, like the DBaaS, can use it. It will allow to reference instances by domain names instead of just IP addresses, which can be more friendly in knowing to what we are connecting.

Time to create a new Compute Instance

Once the network is done, it’s time to move on to the Compute Instance itself.

Select “Compute” > “Instances” from the main menu.

There are now a lot more options which can be set compared to the Virtual Cloud Network, but the interface will give you many defaults / pre-selected values making things easier.

Enter a proper name, allowing you to know what the instance is for. Select the image to use as OS for your instance. I generally use Oracle Linux 7.7 (8 exists but is still quite new and therefore I still don’t use it) or Oracle Autonomous Linux (which is based on Oracle Linux 7).

Select the type and shape. Virtual Machines is what I always pick as I don’t need a physical server, therefore it’s easier to create and drop them. The shape depends on your plans for the Compute Instance, based on the availability and your needs of CPU and memory. Keep in mind the cost is related to the shape.

Select the Virtual Cloud Network to connect this instance to (the one created previously or an existing one). If you don’t need to have a public IP address for the instance, it’s safer to not assign one: the instance will not be accessible from outside your Virtual Cloud Network.

Last one is really important: enter the public key you will use to connect to the instance. The Linux instances don’t have a password but require a private key to connect.

And your Compute Instance is being setup!

The provisioning took (for me) less than a minute, the Compute Instance quickly switched to “green”.
Once it is running you can find the private IP address (received by the Virtual Cloud Network) and, if you selected to get one, the public IP address you can use to connect from outside.

From inside the Virtual Cloud Network you can connect using the private IP. From outside you will need the public IP.

Connect via SSH to the Compute Instance

The Virtual Cloud Network has, by default, a rule allowing SSH (port 22) connections to instances. If your Compute Instance has a public IP address and you selected either Oracle Linux 7.7 or Oracle Autonomous Linux the username to use is opc. Other images can have other usernames, check the documentation to find out what it is.

Connecting via SSH to the public IP address, using the OPC username and the key selected when creating the instance.

Install JupyterLab

I always start by updating the whole system, updating all the installed packages by entering:

sudo yum update -y

1	sudo yum update -y

In Oracle Linux 7.7 there is by default Python 2 installed (2.7.5 at the time of writing). I prefer to use Python 3 for all my notebooks and also for the setup of JupyterLab.
Before to install it, I always check which repositories are enabled because Python 3 can be provided by more than one, and therefore I want to figure out if I need to disable a repository or enable one to install what I need.

Both “ol7_latest” and “ol7_developer_EPEL” can provide Python 3, I therefore use “ol7_latest” by disabling the EPEL one at install.

sudo yum install -y --disablerepo=ol7_developer_EPEL python3

1	sudo yum install -y --disablerepo=ol7_developer_EPEL python3

Once done I have now 2 different Python available.

Python 2.7.5 and Python 3.6.8 are both available.

JupyterLab use Node.js to compile extensions, by default an older version is available in the system and some extensions display warning messages that a newer Node.js would be better. Just like with Python 3 multiple repositories could provide Node.js, and by disabling the EPEL one again I can get the newer version.

sudo yum install -y oracle-nodejs-release-el7
sudo yum install -y --disablerepo=ol7_developer_EPEL nodejs

1 2	sudo yum install -y oracle-nodejs-release-el7 sudo yum install -y --disablerepo=ol7_developer_EPEL nodejs

Same package name, different repositories = different versions.

A quick check that both *Node.js* and *npm* are correctly installed.

Python virtual environments to keep things simple and clean

Just like two different versions of Python can coexist in the system, it’s often possible that different projects have different dependencies, maybe the same package is needed but in different versions.

To keep things simple and clean I always create Python virtual environments based on my needs. In this way all the packages I need are installed into an “isolated” environment without affecting the system-wide Python or any other Python virtual environment. It duplicates things on disk, but it’s a price I’m happy to pay to keep things clean.

python3 -m venv ~/py36env
# activate the virtual environment
source ~/py36env/bin/activate
# make sure the right python has been used
# (as you see now 'python' is Python 3)
python
# update PIP, the Python package manager system
pip install --upgrade pip

python3 -m venv ~/py36env

# activate the virtual environment

source ~/py36env/bin/activate

# make sure the right python has been used

# (as you see now 'python' is Python 3)

python

# update PIP, the Python package manager system

pip install --upgrade pip

Create a Python 3 virtual environment, enable it and update pip.

The Python virtual environment, when enabled, redefine the python and pip commands to point to the Python 3 of the virtual environment and the PIP of the virtual environment too.

Installing JupyterLab is a single command:

pip install jupyterlab

1	pip install jupyterlab

JupyterLab install in an automated way all the requirements it has.

Running JupyterLab

Everything is installed, JupyterLab must work…

Starting JupyterLab right after the install: not looking good.

By default, JupyterLab (and Jupyter) works locally and when started automatically open the browser pointing to the right address. This will work if you install the tool on your own laptop. To work as a proper Client-Server with JupyterLab running in a remote instance some extra configs are needed.
First the JupyterLab config file must be generated. And once the file is there, it has to be edited to add some extra settings.

Generate the Jupyter configuration file.

Append some configuration settings inside the file.

# Enable Jupyter to accept connection from outside the host running it
c.NotebookApp.allow_remote_access = True
# Accept connection come from any source
c.NotebookApp.ip = '*'
# Do not open a browser (as there isn't any and it's a remote setup)
c.NotebookApp.open_browser = False
# Use this folder as 'root' folder for the notebooks (need to create it)
c.NotebookApp.notebook_dir = u'/home/opc/jupyter'
# Instead of a random token, use this password as login
c.NotebookApp.password = u'…'

# Enable Jupyter to accept connection from outside the host running it

c.NotebookApp.allow_remote_access = True

# Accept connection come from any source

c.NotebookApp.ip = '*'

# Do not open a browser (as there isn't any and it's a remote setup)

c.NotebookApp.open_browser = False

# Use this folder as 'root' folder for the notebooks (need to create it)

c.NotebookApp.notebook_dir = u'/home/opc/jupyter'

# Instead of a random token, use this password as login

c.NotebookApp.password = u'…'

The password is encoded, it can be generated using this piece of code.

Once the configuration is done, it is possible to start JupyterLab (again) and hopefully this time everything works!

Starting JupyterLab after configuring the required settings.

The notebook can be tested by setting up an SSH tunnel forwarding requests to localhost:8888 on my laptop to the Compute Instance on port 8888 directly.

Jupyter prompt for the password defined in the configuration file. If no password is defined a random token will be written in the Jupyter log and it must be entered. I’m using an SSH tunnel to access it, that’s why I’m using “localhost”‘.

Welcome in JupyterLab your new sandbox running in an Oracle Cloud Compute Instance.

Enable access to JupyterLab using the public IP of the Compute Instance

Trying to access JupyterLab via the public IP address without configuring few more things isn’t really going to work.

The Compute Instance has a firewall running, to enable connections from outside to port TCP 8888, the default JupyterLab one, the firewall must be configured by opening the port.

sudo firewall-cmd --zone=public --permanent --add-port=8888/tcp
sudo firewall-cmd --reload

1 2	sudo firewall-cmd --zone=public --permanent --add-port=8888/tcp sudo firewall-cmd --reload

The current firewall settings. No open port TCP 8888.

The port must be added to the firewall (as permanent) and the firewall reloaded.

The Virtual Cloud Network must also be configured to allow access from outside to port 8888 because by default only port 22 for SSH is open.

Need to go back to the VCN to open the port as well.

Select the existing Security List (created by default when the VCN has been created).

Add a new Ingress Rule to allow connections to port 8888.

It’s a really good idea to define from where the service must be accessible. I have a fixed IP address, therefore I enter it as Source CIDR.

Done! JupyterLab is now accessible

Using the public IP address of the Compute Instance I can now access JupyterLab.

Right now, JupyterLab works because started via an SSH session, if I close the session JupyterLab will be stopped as well. It’s possible to setup a system service to get it started at boot of the Compute Instance.

It’s also possible to install new kernels, install extensions and any other setting you would need. You have full access to the Compute Instance, nothing is blocked.

TL;DR

Once you have a running Compute Instance based on Oracle Linux 7.7 or Oracle Autonomous Linux:

sudo yum repolist enabled
# is ol7_developer_EPEL enabled?
# if 'yes', need to exclude it for Python and Node.js install

# install python 3
sudo yum install -y --disablerepo=ol7_developer_EPEL python3

# install Node.js
sudo yum install -y oracle-nodejs-release-el7
sudo yum install -y --disablerepo=ol7_developer_EPEL nodejs

# create python 3 virtual env
python3 -m venv ~/py36env
source ~/py36env/bin/activate
# update pip
pip install --upgrade pip
# install &amp; configure JupyterLab
pip install jupyterlab
jupyter notebook --generate-config
# by default ~/.jupyter/jupyter_notebook_config.py
# add to the file:
  c.NotebookApp.allow_remote_access = True
  c.NotebookApp.ip = '*'
  c.NotebookApp.open_browser = False
  c.NotebookApp.password = u'sha1:4ac2e73c51ff:d84274597c8fe0f27ee99631cc3abf0d9300a26a' # 'oracle'
  c.NotebookApp.notebook_dir = u'/home/opc/jupyter'

# Oracle Linux firewall
sudo firewall-cmd --list-all
sudo firewall-cmd --zone=public --permanent --add-port=8888/tcp
sudo firewall-cmd --reload

# start JupyterLab
jupyter lab

sudo yum repolist enabled

# is ol7_developer_EPEL enabled?

# if 'yes', need to exclude it for Python and Node.js install

# install python 3

sudo yum install -y --disablerepo=ol7_developer_EPEL python3

# install Node.js

sudo yum install -y oracle-nodejs-release-el7

sudo yum install -y --disablerepo=ol7_developer_EPEL nodejs

# create python 3 virtual env

python3 -m venv ~/py36env

source ~/py36env/bin/activate

# update pip

pip install --upgrade pip

# install & configure JupyterLab

pip install jupyterlab

jupyter notebook --generate-config

# by default ~/.jupyter/jupyter_notebook_config.py

# add to the file:

c.NotebookApp.allow_remote_access = True

c.NotebookApp.ip = '*'

c.NotebookApp.open_browser = False

c.NotebookApp.password = u'sha1:4ac2e73c51ff:d84274597c8fe0f27ee99631cc3abf0d9300a26a' # 'oracle'

c.NotebookApp.notebook_dir = u'/home/opc/jupyter'

# Oracle Linux firewall

sudo firewall-cmd --list-all

sudo firewall-cmd --zone=public --permanent --add-port=8888/tcp

sudo firewall-cmd --reload

# start JupyterLab

jupyter lab

Recent Posts

Categories

Archives