How climate researchers use high-performance computing + Big Data | #IBMEdge
This week theCUBE is at IBM Edge in Las Vegas, broadcasting live for SiliconANGLE. In this interview, Pamela Gillman, Mgr. Data Analysis Services Groups for the National Center for Atmospheric Research (NCAR), sat down with Dave Vellante and Jeff Frick to talk about the resources her team uses for conducting research on the past, present and future of our climate.
The NCAR provides resources that allow atmospheric researchers to study the climate. It’s federally-funded and managed by the University Corporation of Atmospheric Research (UCAR), a consortium of universities that makes up their governing body.
Unlike weather forecasts, climate research is long-term. Gillman mentioned that some researchers run models where they look at what happened in the weather over a 1,000-year time period. Examples include cyclic patterns and correlations from year to year as well as through time.
Regarding today’s climate, Gillman said that “the majority of scientists believe that we’re in a period of change,” citing melting ice sheets and inconsistent weather changes. Gillman described that one group she works with is studying the climate in the Paleolithic era, trying to determine if their predictive models are able to show what happened in that time period.
“They’re trying to look at what they think can occur and what’s happening,” said Gillman, explaining this research is mostly for trying to figure why these current changes in the climate are occurring, if it’s normal change and reaching a consensus.
NCAR’s Supercomputer center
To gain clarity on what NCAR offers to researchers, Frick questioned Gillman on what they specifically provide. Gillman said that NCAR is built of five or six labs, where all but one does the science. She works with two groups that do the models; one produces the Paleolithic and future climate data, while the other focuses on hurricane forecasting. The Computational Information Sciences Lab where Gillman works produces and manages those resources. It’s a 25,000 sq. ft. supercomputer center that stores their flagship iDataPlex system.
- Data and Flash
Vellante then asked Gillman what NCAR’s data sources are. Gillman said that most of the data is produced either on their computer or at other national centers. She added that one focus is in shifting data transfer protocols to the spinning storage, so that even when data is produced elsewhere, they’re able to effectively and efficiently retrieve it. NCAR’s facility currently has about 33 petabytes in their tape archive, and about 18 petabytes of available spinning storage.
At this time, NCAR doesn’t use flash, but are very interested in moving forward with it, and are looking at flash as the burst buffer. Because their models do a lot of small file output data, they run through time. Gillman said having that flash close to where they’re producing data that can handle these time steps, and allow it to trickle out to spinning storage.
- HPC, Big Data and the IPCC
Moving on to the topic of high-performance computing (HPC) and Big Data coming together for commercial applications, Vellante asked Gillman to share her observations on where analytics fit in and how it affects architectures.
Gillman responded by discussing the data output work that she’s done for the International Panel for Climate Control (IPCC) runs, which occur every four-to-five years. She said that total amount of data output during her first run (IPCC 4) was 100 terabytes, which wasn’t difficult to manage. Once the data is produced, it’s available to the community for about five years. The IPCC 5 run was completed a few years ago, and Gillman said that NCAR isn’t able to curate all of the data from the run. They have about 1 – 2 petabytes of that data, and that isn’t even all of it.
Gillman then said that they bring that data in, hosted in Science Gateways that provide access to the shared resources. This allows analytics to occur so that the community is able to choose portions of data from a run. Gillman explained that they’ve coupled this functionality with their computational side so that the group managing that data can use their resources to pull variables out, package data sets and then deliver to customers.
Information-centric business model
Vellante then asked Gillman what architectural changes we should we expect coming down the pipe in the next few years. Gillman started off saying that, in the past, the job of supercomputer guys was to have a fast machine, able to put data out as quickly as possible. Then, it was someone else’s problem. That data moved around from resource to resource for each task.
Gillman then said that what NCAR did with the data center is pull all those resources together into a central pool. “So, what we’re trying to do is shift to where, as the data is produced, somebody can look at it, and they don’t have to move it,” she further explained. Gillman added that they’ve referred to this “as an information centric model of trying to get the user to move what they’re doing to where the data is”.
Gillman would like to see a tighter coupling of that, and hopes for analytics to be possible during computation. She also believes that flash plays into this. If some memory can be kept, and post-processing or analysis can be worked on before it goes to spinning storage, that would speed up workflow.
Vellante said that her concept sounds Hadoop-like. Gillman said that they don’t currently use Hadoop technology because their codes aren’t structured to work well with Hadoop at this time. The challenge is that climate codes are very large. In fact, NCAR’s setup is six pieces of code that talk together. They do, however, have an effort underway to look at their codes and see if they can move to incorporate newer technologies.
Since you’re here …
Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!
Support our mission: >>>>>> SUBSCRIBE NOW >>>>>> to our YouTube channel.
… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.