Inside Lucata’s graph analytics-optimized superserver
The field of graph analytics was virtually unknown outside of the academic realm just a few years ago, but all that has changed with the market’s insatiable hunger for all things big data. The top two companies in the graph database market – Neo4j Inc. and TigerGraph Inc. — have collectively raised more than $760 million and startup activity is brisk.
One of those startups is making hardware. Earlier this month, Lucata Corp. announced an architectural extension to Intel Corp. processors that is optimized for graph analytics processing at a massive scale. Lucata says its Pathfinder server provides as much graph analytics processing power in a single rack as 16 racks of Intel Xeon servers while consuming 90% less power. Up to 8,000 chassis can be lashed together and share a single memory plane.
Lucata, which was founded by a team of richly credentialed computer science Ph.D.s, is achieving these results using field-programmable gate arrays, which are integrated circuits that are designed to be programmed for specific functions. They are notable for their low power consumption and flexibility but are not replacements for general-purpose central processing units.
Built to scale
Lucata has optimized its FPGAs for random memory access, which makes its server a good fit for its target market. Graph analytics uses a set of specialized tools to determine the strength and direction of relationships between objects. Graphs can be used to rapidly answer questions such as how many “friend” relationships are required to get from one user to another in a social network or the likelihood that a customer who buys flowers will be interested in ice cream as well.
Graph analytics assumes that nothing is known about linkages in a data set, which means the more memory that can be thrown at the job, the better.
“Say you have a graph of everybody who belongs to Facebook — those people have no geographic orientation, so you can’t predict what linkages will occur or how the nodes will be connected,” said Martin Deneroff, Lucata’s chief operating officer and recipient of numerous patents in system design. “If you’re mapping everybody connected to a particular node you would need many servers to access that much memory. The majority of those accesses won’t be on the node where the program is running.”
A different kind of processor
That’s where conventional CPU-based servers don’t lend themselves well to graph scenarios, Deneroff said. For one thing, CPUs are limited in the amount of memory they can address. Scaling beyond that limit requires adding more CPUs, which adds network latency and administrative overhead.
CPUs are also optimized for transaction processing across a limited body of known elements, such as records in a database. They essentially bet on which instructions will be needed next and load them into a cache preemptively for better performance. “The fact that you recently addressed a memory location implies that the next one will be nearby,” Deneroff said.
Lucata has no advantage in such a scenario. However, “once that assumption breaks so that cache hit ratios are low, our architecture excels,” he said. “We’re orders of magnitude faster.”
In Lucata’s purpose-built servers, all the memory is in a single flat address space and is treated as local by the FPGA processors. “The programmer doesn’t need to worry about where the data is. As you add more chassis, the addressable memory space expands accordingly,” Deneroff said.
Both memory size and performance scale linearly, so “a program can run on a single, eight-node chassis or a 1,000-node system without changing anything,” he said. “Because we have a high-performance flat network with uniform bandwidth, performance scales almost linearly no matter how many nodes you add. You lose no more than a couple of percent of performance each time you double capacity.”
Taking programs to data
Lucata also uses a patented technology called migratory threads that essentially flips the conventional processing architecture on its head.
In a typical computing scenario, a CPU that needs to access memory outside of its local address space must send a remote procedure call to another node over the network. Migratory threading “turns that upside down and moves the program to where the data is,” Deneroff said. Because programs in a graph context are usually much smaller than the databases they access, this approach cuts down on network traffic and its accompanying latency.
“We ship a copy of the program across the network to the CPU that’s closest to the data we want to access,” Deneroff said. “You get between 10 to 1,000 times less network traffic.”
Graph analytics isn’t the only potential use case for large-memory machines, but the company chose to start there because the market is large and growing, he said. Lucata, which has raised $29 million, hopes to migrate to an architecture based on application-specific integrated circuits, which can scale linearly with clock rate. However, “it’s a big commitment,” Deneroff said. “We did not have the means to build an ASIC” within the available timeframe and funding.
Image: Neo4j
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU