UPDATED 09:00 EDT / OCTOBER 20 2020

BIG DATA

Neo4j advances machine learning compatibility for its graph database

Graph database developer Neo4j Inc. is upping its machine learning game today with a new release of Neo4j for Graph Data Science framework that leverages deep learning and graph convolutional neural networks to make data about graph connections more accessible to mainstream data science algorithms.

Specifically, release 1.4 adds graph embedding, a technique that calculates the shape of a surrounding network for each data element within a graph. Graph databases are unique for their ability to represent complex relationships using nodes, relationships and key-value pairs that define linked data items using a unique identifier. These connections can be traversed to find correlations that would be difficult or impossible to discover using relational tables because of the large number of joins that would be required.

However, multidimensional graph relationships don’t map cleanly to the lower-dimension vectors that are common in machine learning data sets. Graph embeddings make this possible by sampling the topology and properties of the graph to reduce its complexity to just the significant features that are needed for further machine learning.

“Graph embedding learns the structure of your graph to improve your knowledge of the graph,” said Alicia Frame, Neo4j’s product manager for the Graph Data Science library. “It’s graduating from chasing pointers to running really fast queries.” Without the reduction in complexity, an adjacency matrix for a 5 billion-node graph would have to have 5 billion-squared elements. “This distills that giant graph into a computer representation of every node in your graph,” she said.

The enhancements significantly increase the scope of data science algorithms that can be run against a graph beyond the basic set that was included when the library was introduced in April. They’re part of Neo4j’s broader goal to take graph databases beyond queries of raw data to predict outcomes based on connections.

Specifically, the company is adding three new embedding options. First is Node2Vec, a popular graph embedding algorithm that uses neural networks to learn continuous feature representations for nodes, which can then be used for downstream machine learning tasks.

FastRP (random projection) is a node-embedding algorithm that Neo4j says is up to 75,000 times faster than Node2Vec with equivalent accuracy and extreme scale. Although it’s functionally equivalent to Node2Vec, Frame said many data scientists will likely use both.

“FastRP is lightning fast but more work to tune the embeddings to know what you want,” she said. “Many customers will run Node2Vec till they get results that make sense to them and then go to FastRP to run them at scale.”

GraphSage is an embedding algorithm and process for inductive representation learning on graphs that uses graph convolutional neural networks. This can be applied continuously as the graph updates.

The upshot is that “we’re taking techniques that used to require a Ph.D. and democratizing them so anyone can download and have the power of graph predictions,” said Frame, who holds a Ph.D. “Before, we’d use a graph to store the data with the machine learning happening in Python. We’re connecting the dots.”

Image: Neo4j

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.