UPDATED 11:29 EDT / FEBRUARY 06 2014

NEWS

Facebook data centers debunk myths : Efficiency is profitable | #OCPSummit

jay-parikhA couple of days prior to Facebook’s 10-year anniversary, Jay Parikh,  the company’s VP of Infrastructure,  took the stage at the Open Compute Summit V in in San Jose, California, delivering a very detailed presentation to the audience regarding Facebook’s history, breakthroughs and vision for the future.

Titled “Efficiency is Profitable: Facebook’s Approach to Global Infrastructure Optimization”, the Panel reunited Parikh on stage with Facebook’s Matt Corddry, Director of Hardware Engineering, and Marco Magarelli, Design Engineer.

“At the previous Open Compute Summit you’ve heard the Facebook team talking a lot about the big challenges that we’ve been facing in solving for our infrastructure,” started Parikh. “You’ve heard about things like cold storage, vendor agnostic motherboards and what we’ve done in our electrical and mechanical design, increasing the efficiency of the data centers.”

Parikh’s intention for this year was to take a step back, walk through history a little bit, explaining why all of this stuff they’re doing actually matters.

oem-servers-network“This picture should look familiar for everyone in the room,” said Parikh, indicating that this was a standard set of racks back in the day Facebook was just a new kid on the block.

“We ran a lot of open source software when we started Facebook, and, over the years, as Facebook grew, the infrastructure team only focused on one thing: keep the site up and running. We made sure that the user-growth could continue and that our product development (which was very rapid), could continue at a very rapid pace.”

According to Parikh, all this was a really challenging task; they succeeded in making it work, but it wasn’t that pretty. “We stretched the practical limits of every part of our infrastructure over and over again: the software, the hardware, the data center and the network. The story that we had to buy fans and cool off the data center is not folklore,” he confirmed.

“We had some spectacular failures, but in the process we’ve learned some basic things: at scale, all your problems are going to be magnified.”

The cost of Facebook’s business

 .

Parikh continued his presentation by highlighting the cost impact on businesses, the performance impact on users and applications and the operational burden on the team who spends time trying to keep things running instead of thinking of ways to move things forward.

Therefore, explained Parikh, they started off humble, rethinking the infrastructure at their data center level. The electrical schematic was scheduled one night at 2 A.M. This led to the first data center and it’s been replicated many times since then. In parallel, the hardware team started the hardware lab, trying to provide answers to the question “What can we do to make things more cost-effective and more energy-efficient?” The data center team and the hardware team worked together, developing these optimizations and the only constraint was ‘move fast’.

Renewable energy

 

“We did a lot of work trying to minimize the amount of energy we pull off the grid to power our workload at Facebook,” explained Parikh. “Our goal was that, by 2015, to be 25 percent renewable.”

Today that goal seems within reach: “One of our data centers is 100 percent hydro and another one is 100 percent wind,” said Jay Parikh.

renewable-energy

Full-stack Optimization

 

“Over time, we were forced to take control over each part of the stack, because we had to keep up with the user growth. We also needed to keep balance on the flexibility. As we launch products all the time, sometimes multiple times a day, we didn’t want our infrastructure to get too rigid, too cheap, to slow us down from a development perspective,” added Parikh. “Balancing flexibility and efficiency is very hard.”

On the software side, Facebook was basically rewriting the new front-end web, as it would have been impossible to buy 5 times more servers to keep up with the user-growth. This had a massive impact.

As far as the network is concerned, the machine-to-machine traffic inside of Facebook is enormous and, according to Parikh, “it has grown about 25 times just in the last few years.” The software system functions on tens of thousands of servers, serving over 4 billion operations every second. “This is the type of scale we need to be looking up in building our network stack,” specified Parikh.

Regarding the servers and storage, the original OCP design was 38 percent more energy efficient, costing 24 percent less. “I am very proud of the energy efficiency levels achieved in ourdata centers,” stated Jay Parikh. “Of course, all of this matters in terms of saving money. And we’re not just saving money, we’re saving a ton of money,” he emphasized. “Over the last 3 years, our infrastructure and our focus on efficiency have saved us over 1.2 billion dollars.”

Facebook datacenters debunking myths

 .

Next speaker on stage, Matt Corddry, Director of Hardware Engineering, proceeded in debunking the myth that “hardware development requires a tremendous team building custom designs.”

matt-corddry

 “Our team is so small, it could probably fit in a school bus. It’s a tight, compact, fast moving group of folks that build this hardware,” clarified Corddry.

He shared with the audience some of the tricks learned along the way, that keep Facebook lean and fast moving:

1. Think big – “For us this means looking at the entire ecosystem that we’re building. It’s really easy to get overly focused.”

2. Be a user – “One of the requirements in my team is to work as a technician in a production data center. That gives us a hands-on experience of what works and what doesn’t.”

3. Design with dollars – “It boils down to something really simple but, for some reason, we see people miss this one. You need to give your engineers full transparency into the cost model that drives into their design. We empower our engineers with the actual costs of power, cooling, data center construction, network, the acquisition costs of the servers, storage and racks, because we are trying to influence the hundreds of small decisions that the engineers make every single day. It’s an incremental approach; you have to start with efficiency as one of your goals and you have to give the designers all the tools and the data that they need to make that happen, and trust them to do that for you,” explained Corddry.

4. Use the community – “We are surrounded by this amazing collection of hardware experts in this room; I tell my team to ‘only do what you must do’. Don’t try to reinvent technology and architecture that others in the community are already working on. No matter how many people we hire, there’s always going to be more smart brains outside the Facebook. We leverage the community and our partners extensively.”

“These sound like great principles and I’m sure you heard some of them before,” concluded Corddry, who went on to exemplify how they actually produce results:

1. Think Big: “We actually planned open rack as a triplet. And then we thought ‘what if we filled it with storage?’ So went on and built a mock-up rack, 5000 pound triplet. Unloading it from the truck, it took 10 guys to move it, and its momentum still put a dent in the wall,” joked Corddry. “That made us take into consideration the open rack and guided us to building a singlet, which is the open rack you see today. All the decisions are rooted in the fact that we were trying new things and being part of the operational world.”

2. Be a user: “Another lesson learned from going to the data centers and being part of the ops team is Reuse. We are using the Group Hug design in something that we call the Honey Badger, which is a new storage adapter for microservers into open vault. It’s a new product Facebook is developing, which is going to be introduced into OCP later this year,” announced Corddry. “We’re also using the same microserver card in the second generation open rack – which you’ll also see in the Open Rack Engineering Summit today. We’re simplifying our inventory parts and the amount of work we’re doing in open standard to solve multiple problems.”

3. Design with dollars: “The cost-modeling forced us to compare the spinning disks and SSDs. We’ve always said spinning disks were cheaper, but then we started into consideration not just the acquisition costs, but the lifetime power draw and the operational costs associated with the failure rate of these devices, and it showed that an SSD could work. But, when we looked at the data center class, SSD was priced too high and it wouldn’t have worked with the business model for a web server boot application. We identified the Netbook class SSD that offers a lower TCO than any spinning disks can offer.”

4. Use the community: “We still put some of our data centers racks in co-location facilities, so we need a traditional input power shelf for UPS power based data center environment. We found out that Rackspace and Delta had already done work to develop that for open rack and we were able to work with them, take their design, adapt it for a couple of details and get it rolled out. That only took a few months of work, but if we’d started from scratch, it would have been a year of work to build that product. By working in this open community, we were able to move much faster.”

In conclusion, noted Matt Corddry, “it doesn’t take an army to build open hardware or to be part of this ecosystem, if you think big, if you learn from your users, if you leverage the community around you and always model the total costs of your designs.”

Rethinking data center design, rapid scaling

 .

The third speaker of this Panel, Marco Magarelli, Design Engineer, went on to elaborate on a couple of concepts that Facebook is developing for a rapid deployment of the data center.

marco-magarelli

 “Our goal was to develop the most efficient data center ecosystem possible,” admitted Magarelli. “We are constantly rethinking, trying to find ways to do things faster and better.”

In order to achieve these goals, Facebook got together with some industry experts to develop ideas and strategies in order to deliver more with less (less costs, less time).

They went on to streamline their processes and simplify their details. They also worked with a couple of vendors, trying different scenarios in order to attain a faster deployment data center.

“With one vendor we worked on a chassis approach, while also looking into the option of removing some of the structure. The idea with the UNIT IT model was to create these data hall modules, electrical room skids as well as air handling solutions and bringing all these three pieces together, delivering two data halls in the time that it would take one,” explained Magarelli.

rapid-deployment

“With another vendor we tried to develop a flat pack scheme. Today we have a very rigid roof structure that needs to carry the weight of all our distribution as well as the penthouse, and we have to suspend our structure, requiring a lot of work and assembly in the field. We therefore wanted to develop something that was fully supported and panelized.”

“We are making a lot of progress on both of these concepts, and we are looking forward to sharing more of what we find,” bragged Marco Magarelli.

Wrapping up the Panel, Jay Parikh addressed the audience again, with a recap of Facebook’s accomplishments.

“Last year we talked about cold storage and the challenges of full stack optimization across hardware, software, network, data center for this massive amount of data that all of our businesses are accumulating and needing from all the applications that we are trying to build. Here’s a quick update: we have our first cold storage facility, now live, and we have 30 petabytes running in this facility; the second facility will be online shortly and we’ll ramp up to over 150 petabytes in the next month or two,” promised Parikh.

“I also talked about some crazy ideas around new storage medium. We were looking at optical disks, wondering what we could possibly do with Blu-ray. A year later we’ve built our first prototype,” announced Parikh. “This cabinet stores 10,000 disks in it. It’s about a petabyte of storage and we’re trying to get it to 5 petabyte. We expect this device to have about 50 years durability. It will also mean a 50 percent cost reduction for us as well as 80 percent reduction in energy consumption, on top of what we did for cold storage.”

“These are awesome examples of what this community can do working together with the crazy concepts and the passion projects that we have, and solve these types of challenges,” added Jay.

OCP and Facebook’s mission

 

In Parikh’s own words, “Facebook mission is pretty simple: connect everyone, understand the world, build the knowledge economy.”

facebook-vision

“Speed of innovation and efficiency are going to be critical for us to succeed in this mission of connecting everyone,” admitted Parikh. “OCP is a critical element in us getting there as well, and this community is doing amazing things.”

His advice for other companies looking to cut their cost and speed their performance is “looking into the OCP community.”

“Whatever your company’s vision is and whatever you’re trying to achieve, you should look in this community, as more and more building blocks are created everyday,” Parikh said. “If you’re building your own infrastructure, you should really be looking at OCP; look at how much money you can save, look at the optimization that you get to take advantage of.”

More importantly, “if you don’t see what you need in this community, ask,” advises Parikh. “The community loves a good challenge and it’s here to help.”


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU