UPDATED 12:59 EDT / MAY 24 2019

BIG DATA

Let’s play with particle physics! Kubernetes and Google Cloud open CERN research to everyone 

Winning the Nobel Prize for physics isn’t a goal most people can reach. But thanks to Google Cloud and Kubernetes, performing the same experiments as award-winning scientists is now possible. Open access to data from the CERN Large Hadron Collider experiments that led to discovery of the Higgs boson elementary particle in 2012 means that proving the existence of the Higgs Boson particle can now be done by anyone, anywhere.

“All this containerized infrastructure … is getting our soul together, because computing is getting much easier in terms of how to share pieces of software and even infrastructure,” said Ricardo Rocha (pictured, right), computing engineer at The European Organization for Nuclear Research, known as CERN.

Rocha and Lukas Heinrich (pictured, left), physicist at CERN, spoke with Stu Miniman (@stu), co-host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, and guest host and cloud economist Corey Quinn (@QuinnyPig) during the KubeCon + CloudNativeCon event in Barcelona, Spain. They discussed how CERN manages the massive amounts of data generated by the LHC (see the full interview with transcript here). (* Disclosure below.)

Heinrich is a member of the Atlas research team, which along with CERN’s CSM experiment, discovered evidence of the Higgs boson. He and Rocha recently replicated the experiment that proved the existence of the Higgs boson during their keynote address at this week’s KubeCon event.

CERN science creates super-sized data

Scale, latency and performance are concerns for any enterprise, but at CERN they take on a much larger significance. Two high-energy particle beams travel at close to the speed of light inside the 27 km ring of the LHC, with 1.7 billion particle collisions occurring per second.

“The machines can generate something around a petabyte [of data] a second,” Rocha said.

Analyzing this data is the task of the Atlas trigger and data acquisition system. “We cannot write out all the collision data to disk; we don’t have enough disk space,” Heinrich said. Instead, the trigger system analyzes the data in real time and selects only the most interesting collisions to channel into storage.

The trigger system reduces this to around 10 gigabytes a second. “That’s what my side has to handle,” Rocha stated.

Businesses that think they have data storage issues will feel insignificant compared to CERN’s massive data inflow. “We’re collecting something like 70 petabytes a year,” Rocha said. “Our challenge is to make sure that all the effort physicists put into building this large machine, that in the end it’s not the computing that is breaking the world system. We have to keep up.”

Currently, CERN has one giant data center with around 300,000 cores and capacity of around 400 petabytes. “That’s not enough,” Rocha stated.

Linking institutes and research labs around the globe has doubled the storage capacity, but with a major upgrade to the LHC underway, the pressure is on to expand. “Very soon we’ll be talking about exabytes, so the amount of computing we will need there is just going to explode,” Rocha explained.

Kubernetes to the rescue

All options are on the table to solve the problem, as the engineers at CERN tend to be result-orientated, according to Rocha. “It’s a more open-minded community than traditional IT. So we don’t care so much about which technology we use as long as the job gets done,” he said.

CERN had distributed infrastructure years before everyone adopted cloud, but in the past they had to write all their own system software. Having access to open-source communities means CERN teams can focus on application development.

“If we start writing software using Kubernetes, then not only do we get this flexibility of choosing different public clouds or different infrastructures, but also we don’t have to care so much about the core infrastructure, all the monitoring. We can remove a lot of the software we were depending on for many years,” Rocha stated.

Heinrich agreed. “What’s kind of special about scientific applications is that we don’t usually just have our entire code base on one software stack. Sometimes you have a complete mix between C++, Python, Fortran, and all that stuff. So this idea that we can build the software stack as we want is pretty important.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the KubeCon + CloudNativeCon event. (* Disclosure: While this segment is unsponsored, Red Hat Inc. is the headline sponsor for theCUBE’s live broadcast at KubeCon + CloudNativeCon. Red Hat nor any other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU