Graphs are everywhere. From websites adding social capabilities, to telcos providing personalized customer services, to innovative bioinformatics research, organizations are integrating graphs into their web sites. Many high profile companies are specifically adopting graph databases to solve social graph complexities and meet the high query performance levels required at Internet scale. As websites scale from zero to millions of users, traditional relational databases degrade to paralyzing levels of performance. Graph databases, based on decades of research, model and query connected data without performance degradation as the size of the graph grows. However, ‘going social’ does not come without its challenges. To render valuable information from interconnected data, organizations are dealing with massive connected data issues. Companies now look to graph databases to solve the consequential data challenges associated with going social.
Graph databases and the social graph
Graph databases are the most scalable, high performance way to query and store highly interconnected data. They help improve intelligence, predictive analytics, social network analysis, decision and process management – which all involve highly connected data with lots of relationships.
A relevant use case for graph databases is the social graph. The social graph leverages information across a range of networks to understand the relationships between individuals. Facebook, LinkedIn and Amazon are all examples of companies that derived tremendous value from leveraging social and professional graphs and providing a deeper analysis of the data they collect everyday. The biggest challenge that companies face is the ability to handle the exponential growth and massive connected data challenges associated with the social graph.
Most applications today handle data that is deeply associative, i.e. structured as graphs (networks). Some examples of this include social networking sites, tagging systems, content management systems and wikis, that deal with inherently graph-like data.
This results in a challenge as it’s difficult to deal with recursive data structures within traditional relational databases. Each traversal along a link in a graph is a join, and joins are known to be very expensive. With user-driven content, it is difficult to pre-conceive the exact schema of the data that will be handled. The relational model requires upfront schemas and makes it difficult to fit this more dynamic and ad-hoc data.
This is where graph databases shine. A graph database uses nodes, relationships between nodes and key-value properties instead of tables to represent information. This model is substantially faster for associative data sets and uses a schema-less, bottom up model that is ideal for capturing ad-hoc and rapidly changing data.
Why graph databases?
Graph databases improve intelligence, predictive analytics, social network analysis, decision and process management. Many organizations – from websites adding social capabilities to telecommunication companies providing personalized customer services to innovative bioinformatics research – have started realizing that graph databases are one of the best ways to model and query connected data. Anyone with a Facebook account is familiar with the result of what graph databases can do.
According to former Forrester analyst, James Kobielus, the market for graph databases will boom in 2012 as companies everywhere adopt them for social media analytics. Social graph analysis, although not a brand-new field, will become one of the most prestigious specialties in the data science arena.
Why is this? Graph databases find relationships between disparate pieces of data. They run analyses over terabytes of information while maintaining the relationships between the data, even as it changes and evolves. As websites scale from zero to millions of users, traditional relational databases degrade to paralyzing levels of performance.
Graph databases simplify application development—resulting in shorter development times, lower maintenance costs and higher performance. Socially enabled applications are gravitating towards graph databases because other types of databases are not effective for managing relationships between millions of users with multiple connections. A graph database is the ideal solution for any application that relies on the relationships between records.
Social graph database technology will become a key trend in the data science arena throughout 2012 and beyond. We’ve already seen organizations flock to the social graph to help build software, web and mobile applications that take into account information across a range of networks to understand the relationships between individuals. If an organization’s data contains a lot of many-to-many relationships, if recursive self-joins are too costly or limiting to the application and scaling needs, and/or the primary objective is quickly finding connections, patterns and relationships between the objects within lots of data, graph databases are the best solution.
About the author
Emil Eifrem is the founder of the Neo4j project, the world’s leading graph database, and CEO of Neo Technology. Emil is an internationally recognized thought leader in new database technology, having spoken at conferences in three continents.