Over the last year I explored the Oracle Graph Database landscape from all the points of view. One thing is sure: it’s the most polyglot solution I crossed so far. You can “do” graphs from almost any language or tool, including SQL.
Yes, you read it correctly: you can create, load, edit, drop Oracle graph databases just with SQL!

Graph Database by SQL in Oracle Database

Most of the time when looking for the Oracle graph database solution you end up with some Java code or one of the other languages (Python or Groovy etc.). People often forget that with just SQL you can do quite many things already.

In this post I’m going to go through an example in SQL Developer of how to create a graph, how to load it, how to do some queries and analysis and how to drop it when done.

I will be using an Oracle Database 12c R2, version 12.2.0.1 to be precise, as this is the first one supporting graphs (as far as I know at least). The graph is going to be stored in the database itself, making it extremely simple to load: if the source is the database itself there is no data leaving the server, all is done inside the database. For the source data the sample Order Entry (OE) schema will be used (customers, products, orders).

You didn’t deploy the sample schemas when installing your database?
Not a problem: you can download and deploy them afterward from GitHib.
https://github.com/oracle/db-sample-schemas

Oracle sample schema OE (Order Entry)

Oracle sample schema OE (Order Entry)

Obviously the database isn’t a native graph engine. If you need more advanced features for analysing your graph it could easily be loaded into PGX. Parallel Graph AnalytiX (PGX) is a toolkit for graph analysis, which also include a single-node in-memory engine or a distributed engine for extremely large graphs. We will look at this aspect in future blog posts.

Create the graph

First thing first is obviously to create the graph. Remember that before to do it, you will need to perform a simple configuration to extend the maximum size of varchar columns (required by the graph solution) and get your DB ready for graphs.

Zhe “Alan” Wu blogged about the steps to get the database ready for graphs, you can find his post at https://blogs.oracle.com/oraclespatial/graph-database-says-hello-from-the-cloud-part-iii.

One of the main requirements is to switch the max string size from standard to extended, which means a varchar could be up to 32’767 bytes instead of the standard 4’000. You can read more about this setting in the documentation.

The creation of the graph couldn’t be simpler than what it is: a single call to opg_apis.create_pg :

First parameter is the name of the graph, second one is the degree or parallelism and the others are defaults (look at the linked doc for the details). The degree of parallelism (DOP) for me is, keeping things simple, the total number of threads available in the system as I want to use as much resources as possible to make things faster (1 CPU with 2 cores with 2 threads each = 1 * 2 * 2 = 4).

After this command you will see, in the schema you used to execute the creation of the graph, some extra tables, the result of the CREATE_PG command.

Create a graph by SQL: 1 command and 5 tables are created

Create a graph by SQL: 1 command and 5 tables are created

The 5 tables created are named with the graph name and a suffix:

  • GE$ : edges of the graph
  • VT$ : vertices (nodes) of the graph
  • GT$ : graph skeleton
  • IT$ : text index metadata
  • SS$ : graph snapshots

Keeping it really simple you can have a graph by simply focusing on the GE$ and VT$ tables and ignoring the others. For this example it’s what I will do as the other 3 tables are for different and more advanced usages.

Graph tables structure

The structure of the vertices and edges tables is quite simple as the number of columns is limited. To make it easier I added comments to each column describing what its role is (so don’t be surprised if your tables will not have a single comment, it’s normal).

Graph table for vertices structure

Graph table for vertices, the comments describe each column

Graph table for edges structure

Graph table for edges, the comments describe each column

A graph stored in the database using the VT$ and GE$ tables uses the PG / FLAT-FILE format and has the following specifications: vertices IDs as integer or long, vertices labels aren’t supported, edges IDs as long, edges label is supported but a single one per edge.

One of the few things to pay attention at is the property type and its value for both edges and vertices: this is a 100% graph thing. The type is a number telling the graph what data type is the value of the given property, the list of options is available in the PGX documentation (https://docs.oracle.com/cd/E56133_01/latest/reference/loader/file-system/plain-text-formats.html).

Graph property type values

List of values for various properties types as visible in the PGX documentation

A given property in a graph must have a unique and single type. Nothing makes that check in the database structure, but that’s how graphs works, so make sure to not mix types for the same property (a same property name in edges and vertices with different types is possible as it relate to different objects).

Once the type is set the value of the property need to be stored in the “V” column if it’s a string. Double stored in the “V” (the string version of it) and “VN” if it is a numeric value, respectively “VT” if it is a date-time value. In this way, with the double storage, the original value is preserved (for calculations and algorithms) and a textual version is available and easy to display without having to pay too much attention at the type and performing transformation or casting.

Create vertices (nodes) by SQL

We saw above the structure of the vertices table, an ID and if needed properties for a vertex. To create a vertex we simply need to insert data into the table respecting that rule. The vertex ID need to be unique but repeated for all the rows with different properties which are related to the same graph vertex (there is a unique constraint on VID-K).

In this example I will create vertices for the customers and products of the OE sample schema as the graph will represent orders passed by customers (so a customer order one or many products: customers and products are vertices, orders will be edges in the graph connection the user and the product ordered).

I wrote a simple query taking some properties of the customers and products and transforming them into vertices. As the customer ID and product ID in the database are already unique, I can use them directly as vertex ID instead of creating a new unique sequence. (double click to see the full code)

Create edges by SQL

As edges will represent the orders done by customers and the database store order and order items separately there isn’t an already available column acting as edge ID. But this value can be generated by a simple RANK over the orders and order items lines to preserve the same edge ID over the various properties the edge will have. The source vertex ID will be the customer ID and the destination vertex ID will be the product ID. All the information describing the order, as quantity and amounts, will be stored as edge properties. The main difference between edges and vertices is that edges can have a label, but in this case I will simply have a unique value of “order” as label.
There isn’t a unique constraint on SVID-DVID-K because there can be multiple edges connecting the same customer to the same products if somebody ordered many times a product. But there is a unique constraint on EID-K as a property must be single for each edge ID.

Done! The graph is loaded!

Graph loaded by SQL

Graph loaded by SQL inserting values in the vertices (VT$) and edges (GE$) tables

Visualize the graph

The graph can be queried by SQL as it is 2 tables which can be joined as required but it can also be used as any other normal Oracle graph. For example a nice and simple way to see the result is by using Cytoscape with the Oracle plugin for graph visualization (http://www.oracle.com/technetwork/database-options/spatialandgraph/downloads/index-156999.html).

Cytoscape to visualize the graph loaded by SQL

Cytoscape to visualize the graph loaded by SQL

In future blog posts I will go more in details on how to perform analysis on graphs with Cytoscape and PGX. At the moment the number of analysis doable by SQL is limited to counting triangles and page rank. The documentation list all the available methods in the OPG_APIS package and each method come with a simple example making it easy to use.

Run a page rank calculation

Not really the best ever graph for a page rank but let’s see how this kind of calculation can be easily done by SQL. There are 2 methods in the OPG_APIS package, one to prepare the calculation of the page rank and one doing the actual calculation. The output of the preparation is important as it shows values which will be needed at the end to clean up the page rank calculation, so keep a note of the values returned.

Page rank calculation in the database

Page rank calculation in the database

The page rank is ready to be queried. The result is stored into the table named after the ‘wt_pr’ parameter (“oesample” in my case). The “node” column of the page rank result table is the vertex ID, while the “pr” column is the page rank score.

Result of the page rank calculation

Result of the page rank calculation

Finally once the page rank results aren’t useful anymore it’s time to clean up. Another method exists for that and require the output of the preparation command as arguments.

The various tables created by page rank have been deleted.

Delete the graph once done

When we are done using the graph we can drop it as easily as we created it.

One single call to DROP_PG passing the graph name as parameter and all the tables and indexes created will all be dropped including all the data.

In just a matter of minutes and by using SQL in SQL Developer we saw how to create, load and drop a graph. It couldn’t be easier than that. No need to write a single line of Java, no need to use Python, no need to use Groovy: just normal SQL.
Stay tuned for more about graphs …

Share This