Graph DBMS - Kludged.net

Graph databases don’t have tables and columns like relational ones. Instead, they have nodes, edges, facets, and predicates.

Node is an object or entity. Also known as a vertex.

Edge is a relationship between nodes

Facet is an attribute of an edge

Predicate is an attribute of a node

Use a node with its type set instead of a table

Use an edge instead of a foreign key reference

Use a facet instead of an additional column on a join table

Use a predicate instead of a column

Time Complexity: Graph vs. Relational

Relationships between data in a graph database is a first class citizen. Instead of doing complicated math to join data together, nodes are simply connected by pointers which results in some major speed improvements.

Using Big O Notation, assuming two tables are labeled M and N, and the results of walking a set of graph nodes are labeled k, the Big O time complexity is as follows¹:

RDMS Nested Join: O(MN)

RDMS Hash Join: O(M+N)

Graph: O(k)

Simplified application logic

When working with relational databases, it’s common to have to save data to many tables in a transaction. The basic flow is:

insert record into table A

get the resulting primary key id for table A

insert record into table B with table A’s primary key as a foreign key reference

repeat for any other joined tables

The responsibility of referential integrity is solely on the application using the database. This gets exponentially more complicated when batch inserting many records and their accompanying foreign table references.

For graph databases, such as Dgraph, the query language has a built in mechanism to put the work where it belongs. The query can contain temporary identifiers called Blank Nodes², which are written like _:identifier. Dgraph creates a UID for each of these, persists them to disk, then returns them in the resultset.

Flexible Schema

Relational schemas must be defined and created before any data can be saved. If a new column is needed, it must be specifically added. If it has things like non-null constraints, the column must be backfilled with a default state before the constraint can be added. This can take hours or days on a large table.

By contrast, the schema and storage layers are separate in Dgraph.³ If a new predicate is used, the database will simply start storing it. Node types and predicates can be defined either before or after data is present.

The schema can be modified by adding, dropping, or changing predicates to reflect business requirement changes, while reducing the overall downtime needed to run migrations.

GraphQL+- reduces the need for an ORM

An ORM or Object Relational Mapping tool is essentially a way to transpile code into SQL and convert the results of a query into data structures.

SQL is declarative, which makes it wonderful for analysis and ad hoc queries, but requires a fair amount of work to compose from an application.

With Dgraph, the query is using a derivative of GraphQL called GraphQL+-⁴ and returns the results in JSON. GraphQL is quickly gaining in popularity over RESTful APIs because each API call can be tailored to a specific request. Dgraph has taken this one step further by adding functions to GraphQL which is used to filter results and traverse the graph. JOINS are done by simply asking for embedded data in the GraphQL request, and results are able to be parsed by existing JSON decoders.

It should be noted that an ORM is still likely needed with Neo4j’s Cypher language and GQL, a new standard derived from it. GQL aims to be a graph query language that complements SQL.⁵ However, since it’s vastly different than most frontend APIs, it would require work to transpile.

Clustering, High Availability, and Load Balancing

NoSQL databases such as document and graph stores were invented out of the need to be able to scale to today’s data needs.

Dgraph was designed from the outset to be horizontally scalable, fast, concurrent, and transactional. It handles shard rebalancing and synchronous replication out of the box so losing a hard drive or server won’t bring down the application.⁶

Dgraph is open source and doesn’t require a commercial license for these features. Neo4j, on the other hand, has free community and startup editions for companies earning less than $3MM USD per year.⁷ The commercial version can get expensive. The last time someone gave me a quote, it was about $20k per server.

Designed for modern hardware

Relational databases were designed for spinning magnetic hard drives. These are able to get around 100 IOPS per drive, where modern NVMe SSD drives can achieve 500k-10MM IOPS.⁸ Hard drives are the area in computing that have seen the largest speed gains in recent times.

Dgraph’s storage engine, Badger, is optimized for modern SSD drives.⁹ This makes it so the overall memory requirements of Dgraph are quite a bit less than alternatives such as RocksDB or Neo4j.