UPDATED 09:00 EST / OCTOBER 20 2020

BIG DATA

Neo4j advances machine learning compatibility for its graph database

Graph database developer Neo4j Inc. is upping its machine learning game today with a new release of Neo4j for Graph Data Science framework that leverages deep learning and graph convolutional neural networks to make data about graph connections more accessible to mainstream data science algorithms.

Specifically, release 1.4 adds graph embedding, a technique that calculates the shape of a surrounding network for each data element within a graph. Graph databases are unique for their ability to represent complex relationships using nodes, relationships and key-value pairs that define linked data items using a unique identifier. These connections can be traversed to find correlations that would be difficult or impossible to discover using relational tables because of the large number of joins that would be required.

However, multidimensional graph relationships don’t map cleanly to the lower-dimension vectors that are common in machine learning data sets. Graph embeddings make this possible by sampling the topology and properties of the graph to reduce its complexity to just the significant features that are needed for further machine learning.

“Graph embedding learns the structure of your graph to improve your knowledge of the graph,” said Alicia Frame, Neo4j’s product manager for the Graph Data Science library. “It’s graduating from chasing pointers to running really fast queries.” Without the reduction in complexity, an adjacency matrix for a 5 billion-node graph would have to have 5 billion-squared elements. “This distills that giant graph into a computer representation of every node in your graph,” she said.

The enhancements significantly increase the scope of data science algorithms that can be run against a graph beyond the basic set that was included when the library was introduced in April. They’re part of Neo4j’s broader goal to take graph databases beyond queries of raw data to predict outcomes based on connections.

Specifically, the company is adding three new embedding options. First is Node2Vec, a popular graph embedding algorithm that uses neural networks to learn continuous feature representations for nodes, which can then be used for downstream machine learning tasks.

FastRP (random projection) is a node-embedding algorithm that Neo4j says is up to 75,000 times faster than Node2Vec with equivalent accuracy and extreme scale. Although it’s functionally equivalent to Node2Vec, Frame said many data scientists will likely use both.

“FastRP is lightning fast but more work to tune the embeddings to know what you want,” she said. “Many customers will run Node2Vec till they get results that make sense to them and then go to FastRP to run them at scale.”

GraphSage is an embedding algorithm and process for inductive representation learning on graphs that uses graph convolutional neural networks. This can be applied continuously as the graph updates.

The upshot is that “we’re taking techniques that used to require a Ph.D. and democratizing them so anyone can download and have the power of graph predictions,” said Frame, who holds a Ph.D. “Before, we’d use a graph to store the data with the machine learning happening in Python. We’re connecting the dots.”

Image: Neo4j

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU