I wrote about setting up a JupyterLab sandbox in an Oracle Cloud Compute Instance not long ago. Now that I have it, I obviously want to use it for some experiments with Machine Learning and graphs. For that I often use an Oracle database as source for my data, and it’s also a cloud service.
How can I connect from Python (my JupyterLab notebook) to an Oracle DBaaS database? Must be easy using cx_Oracle, the Python interface for Oracle database.
DBaaS, Autonomous, Classic …
There are various different database products in the Oracle Cloud offer, and sometime they also use different names to link to the same thing (depending which page you are on).
To me it’s like there are 2 types mainly: autonomous and not autonomous.
The autonomous is split into Autonomous Data Warehouse and Autonomous Transaction Processing. It works well for many things but still doesn’t support property graphs (work in progress…) and that’s why I use something else.
The not autonomous part is either named Database or Bare Metal, VM, and Exadata depending which page you look at. This is what I’m using: just a database running in a VM. But patched and maintained not by me manually via SSH but via the cloud web interface.
Classic was the original, older kind of cloud infrastructure, and isn’t available for new instances anymore (at least for me clicking it send me to some generic content).
To keep things simple, I’m using a Compute Instance running in the same exact Virtual Cloud Network and subnet as my database, to avoid any networking issue: the 2 instances are connected to the same network with nothing supposed to block the communication. Just like 2 servers being next to each other and connected to the same dumb switch.
Install the required pieces…
To test my connection, I have an extremely simple code: connect, run a query, get the result, disconnect.
I don’t expect any issue: the username and password for the database work in SQL Developer, the hostname, port and service name come directly from the DBaaS page where I manage the service.
To connect Python to an Oracle Database cx_Oracle is the official package to use. It’s well documented and widely used with many blog posts and tutorials about it. Still it has to be installed!
All it takes is: pip install cx_Oracle into my Python virtual environment where JupyterLab is installed (installing on the system Python wouldn’t make it available inside the virtual environment).
Once the package is installed, I’m ready to go again, except that now the error is at the connection step. cx_Oracle still needs an Oracle Client to be available on the same server where Python is running.
Installing the Oracle Client is pretty easy as it’s available in YUM. The basiclite version of the client is more than enough, no need to install any other “bigger” client (except if you need some of the TNS tools or SQL*Plus).
If you don’t see any Oracle Client available in your YUM repositories, or you look for a newer version (up to 19.5 is available at the time of writing), look at https://blogs.oracle.com/linux/oracle-instant-client-rpms-now-available-on-oracle-linux-yum-server-yumoraclecom on how to define the repository giving you more choices.
Even with the Oracle Client installed the error is the same. This means it’s better read the whole error message instead of just the first words.
The message has a link, https://oracle.github.io/odpi/doc/installation.html#oracle-instant-client-rpm , which explain in detail what has to be done (installing the Oracle Client via a YUM repository is similar to a local RPM install).
It’s simple … once I took the time to read it!
Because I installed an 18.3 client, I have to add the client to the runtime link path myself. Not having any other Oracle product installed (and which could have set that already in a different way) on the Compute Instance the fix is simple:
After installing the required pieces, it must work … maybe
cx_Oracle is installed, the Oracle Client is installed, the runtime link path set correctly: everything is ready and I can finally connect to the database.
This error generally means the client (my Python code) and the server can’t communicate properly or one isn’t replying to the other.
I first thought it was an issue resolving the hostname maybe. On the Compute Instance I can easily check with nslookup. Apparently, it wasn’t a resolver issue, the hostname point to the correct IP.
After many tests, port scans, TCP dumps to capture all the traffic being exchanged, googling and some guessing I found out that the Security List of the Virtual Cloud Network subnet also plays a role!
The Security List, defined on a subnet of a VCN, act as a kind of generic firewall applying to all and every single host being part of the subnet.
They aren’t rules applying to traffic coming in or leaving the subnet, they are really at the host level.
To allow 2 hosts in the same subnet to communicate, the Security List must allow it with an Ingress and Egress rule.
In my case the Egress Rules allows any kind of traffic to be initiated from hosts in the subnet, therefore only Ingress Rules must be set to allows connection on TCP 1521, the default port used by Oracle Database (listed in the DBaaS service page).
In this particular case my database and Compute Instance are both connected to a subnet 10.0.0.0/24 being part of a 10.0.0.0/16 VCN.
If I only wants hosts in the subnet to connect to the database, I can set 10.0.0.0/24 as source CIDR.
If I have other subnets defined (a “/16” VCN can have up to 256 “/24” subnets) which also need access to the database, you can use a larger source CIDR. (Some networking and subnets IP calculations knowledge definitely helps when dealing with these things.)
In my mind the Security List was more acting as a kind of firewall applied on a gateway, only affecting traffic leaving or entering the subnet, not for inner-only traffic between hosts in the subnet. It’s important to keep that in mind when setting up a multi-host setup, which requires components to communicate internally between various OCI instances.
The %sql magic function in JupyterLab
In JupyterLab you don’t always need cx_Oracle and the whole piece of Python code to execute simple queries. There is a magic function which makes SQL extremely easy. This also works in notebooks using a different kernel than Python.
After installing the ipython-sql package, it will require to load the SQL extensions and to connect to a database (the connection string is slightly different than in cx_Oracle). After you can use %sql followed by a single-line of SQL query or %%sql for a multi-line query.
It also works to directly assign the result of a query to a Pandas DataFrame.