Building Big Data: How Facebook’s Graph Search Was Born

Facebook’s revolutionary search strategy, Graph Search, represents an intriguing marriage of Big Data and social networking that should allow people and businesses to connect in dozens of new and exciting ways.

In case you missed our earlier coverage, Graph Search is essentially an all-encompassing Facebook search engine based on your unique ‘social graph’ that allows you to track down other users according to specified criteria. Users can enter specific queries such as “friends of friends located in New York”, or “best places to eat in San Francisco” and gain insights from their own personal network. Naturally, given the overwhelming size of Facebook, its search tool is going to be scouring through an immense amount of Big Data, something that has led many people to ask just how on earth they managed to build this thing.

Facebook has clearly made a monumental effort with Graph Search. The world’s largest social media network boasts of more than one billion active users who share about 2.5 billion pieces of content every single day, and ‘like’ another 2.7 billion. All this is possible thanks to Facebook’s complex web of trillions and trillions of connections that allow for content and Big Data to be shared over the entire network in a matter of minutes.

See the entire Building Big Data Series on Pinterest and Springpad!

.

So how on earth have they built it? Facebook has by and large been pretty silent on the matter, although Mark Zuckerberg did reveal to SiliconANGLE founder John Furrier that they had designed their own system – nicknamed Unicorn – and mentioned that it involved two opposing strategies, namely a front end to handle the queries and natural language as well as a back-end, persistent storage system. Facebook also gave away a few clues on its Facebook Engineering Page, but for the most part data experts are generally at a loss as to how Graph Search was built.

Even Google has problems with search – its database is huge and many of the results it throws up are questionable, but in the case of Facebook things become even more complex when one consider’s that graph search is no regular search – it’s an entirely new technology built from scratch that doesn’t just search through all of its data, but also looks for relationships and includes semantic search elements, a technology that many had presumed was still in its infancy.

Building Social Search From Scratch

 

There are some clues as to what Facebook has done. One item of interest is this blog post written by a Facebook engineer last year, which describes the numerous challenges of working with cached content. From this, we can gather that its Unicorn system is likely to be based on its own Tao database, which acts as a cached layer above the thousands of MySQL databases that Facebook uses.

Another clue comes from Wikibon’s Big Data expert Jeff Kelly (see video below), who surmises that Facebook has likely combined its own internally-developed code with a variety of open-source tools such as Cassandra and Apache Lucene Solr.

Jeff Kelly, discussing his theories on Graph Search.

 

Facebook’s Lars Rasmussen offers up some further clues in his own blog post that accompanied the big announcement of Graph Search, revealing the daunting scale of the challenge that engineers must have faced:

“In 2011, Zuck asked the search team to design and build a new system that would recreate the ability to search the entire social graph,” writes Rasmussen.

“This was an interesting challenge because–compared to large document collections like the Web–the data in our databases have significantly more explicit structure than free-flowing text. Therefore, a traditional keyword-based search product might not be the answer.”

Rasmussen goes on to explain how Facebook’s engineers were able to come up with their own system that was able to search through its ‘explicitly structured’ data, but then faced a big problem. Many Facebook users place restrictions on who can see the content they share, and so Graph Search needs to be able to display unique results for each user – displaying only that content that has been shared with them, something that becomes more challenging the more complex the query is.

According to Rasmussen, the first prototypes of the technology were built on simple graphical interfaces, “that allowed users–click-by-click–to build up structured, database-like queries.” However, these seemed to be too complex and not up to the challenge set by Zuckerberg.

Later advancements led to a second prototype that, in Rasmussen’s words, “was a naive, exponential-time ‘parser’ written in JavaScript that could mimic the experience we were looking for as long as the searcher input no more than a few tokens.”

Just The Beginning

 

That’s about as far down the road as Facebook has progressed at the moment. Graph Search is a tool that’s still evolving and far from complete. As of today, only a very limited number of users can play with it, and it’s still only capable of searching for people, places, photos and interests – not the timelines and comments that make up the vast quantity of Facebook’s enormous dataset.

Even so, it’s clear that Facebook is determined to make this work at all costs:

“Today’s Graph Search beta is just the beginning,” adds Rasmussen.

“We’re starting with a focus on people, photos, places and interests, but are looking forward to incorporating posts and Open Graph actions, as well as making Graph Search available on mobile and in every language. We’re excited to be able to keep making search more useful, fun and central to how you explore existing connections and make new ones on Facebook.”

About Mike Wheatley

Mike Wheatley is a senior staff writer at SiliconANGLE. He loves to write about Big Data and the Internet of Things, and explore how these technologies are evolving within the enterprise and helping businesses to become more agile. Before joining SiliconANGLE, Mike was an editor at Argophilia Travel News, an occassional contributer to The Epoch Times, and has also dabbled in SEO and social media marketing. He usually bases himself in Bangkok, Thailand, though he can often be found roaming through the jungles or chilling on a beach. Got a news story or tip? Email Mike@SiliconANGLE.com.