UPDATED 15:00 EDT / MAY 29 2020

BIG DATA

Docker helps Australia cure cancer, one child at a time

Containerization is helping fight childhood cancer, as Docker Inc. makes big-data research pipelines scalable and simplifies the sharing of data sets.

Cancer kills more children in developed countries than any other disease. It is the number one cause of death in school-aged children, according to a study conducted by the New England Journal of Medicine, and the only disease to hit the top five causes of death across children from birth to 19 years of age. But that figure could drop to zero thanks to faster, more efficient research pipelines that allow researchers to create precision treatments that target unique cancers within each individual child.

“What we do is we find the mutations that are causing the cancer, and that helps us determine what treatments or what clinical trials might be most effective for the kid,” said Sabrina Yan (pictured, right), bioinformatics research assistant at the Children’s Cancer Institute in Sydney, Australia.

“We’ve made a substantial impact on the survivability of several high-risk cancers in pediatrics,” stated Kamile Taouk (pictured, left), who is a student and intern.

Taouk and Yan spoke with John Furrier, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during DockerCon Live. They discussed how the Children’s Cancer Institute has “Dockerized” its big-data analysis pipeline, enabling fast, efficient and cost-effective individualized cancer research for pediatric patients. (* Disclosure below.)

Personalization makes cancer treatments more effective

cancer_final As every engineer knows, even the most precisely planned processes can veer off course. The human cell is no exception. The first known diagnosis of a malignant tumor is shown on a papyrus from Ancient Egypt. And for centuries, the disease was thought to have no cure. Even now, many fear the diagnosis as a death sentence.

Despite the figures that show cancer as the leading cause of death for children, the survival rate is an optimistic 84% across children and adolescents in the U.S. But those figures drop dramatically when narrowed down to aggressive forms of cancer.

Children’s Cancer Institute is helping to lead the The Zero Childhood Cancer Program, which includes a study of some of the approximately 200 young Australians diagnosed with high-risk cancers each year. These children currently face only a 30% chance of living to adulthood. That containerization could help change these odds seems a far-fetched use of the technology. But cancer has gone from being untreatable to beatable, and modern data science is behind the change.

Most cancers are treated with a combination of surgery, radiation and chemotherapy. However, “children are unique in the sense that a lot of the typical treatments we use for adults may or may not work, or will have adverse side effects in kids,” Yan stated. This is where technology comes to play; enabling researchers to focus on the profile of a patient’s individual tumor through genetic sequencing.

“That allows us to specialize the medication and the treatment for that patient and essentially lets us improve the efficiency and the effectiveness of the treatment, which in turn obviously has an impact on the survivability of the cancer,” Taouk stated.

Terabytes of data per child

Image: Pixabay

Discovering a patient’s genomic profile requires creating a whole genome and RNA sequencing process pipeline. “We sequence the healthy cells, and we sequence their tumor cells,” Yan said, adding that the information is then analyzed to pinpoint the mutations specific to that child.

While the research process may sound simple when described as such, each patient has several hundred terabytes worth of data. And until recently, Taouk, Yan, and fellow researchers relied on an inflexible system of multiple, fine-tuned applications with multiple dependencies. This meant setting up a single pipeline to research treatments for one child could take days, or even weeks.

“And even then, a lot of things didn’t work,” Yan stated.

The problem was that each pipeline required customization for tools with multiple different dependencies. “To install four different versions of Python, or three different versions of R, or different versions of Java onto one machine in order just to run [a process] is a bit of a pain,” Yan said.

Updating the technology to make the process more agile, scalable, and be able “to move the pipeline to the data rather than the other way around,” would save time and expense, according to Taouk. The solution: containerization using Docker.

But while the Children’s Cancer Institute’s research team team knew the end result would be worth it, the process was anything but simple. “It’s been quite a project to convert the pipeline that we have,” Yan said. Over the past months, she and Taouk have been working together on a project that involved individually converting each of the tools used in the pipeline to Docker. “That’s just such a pain to do,” she said.

“It was quite difficult, because we had to preserve every single version of every single dependency in one instance, just to ensure that app was working,” Taouk agreed. As the apps get updated regularly, the team also had to ensure that its Dockers would survive updates, he added.

Each member of the team was assigned a bioinformatic tool from the pipeline to Dockerize individually. Some tools would be easy and done in a matter of hours. Others could take weeks, according to Yan. “When it comes to bioinformatic tools, some of them are very memory-hungry; some of them are very finicky; some of them are a lot more stable than others,” she said.

The hard work paid off when the team started using Docker to run the pipeline. “Eventually, you slog through this process. Then you have a Dockerfile set up where anyone can run it on any system, and we know we have an identical setup,” Yan stated.

Having each tool contained within an individual Docker means that a pipeline is created by linking the inputs and outputs. But “it’s not as simple as run A, run B, run C, and then you’re done,” Yan stated. She described the completed pipeline as looking “ like a gigantic web of applications.” But while it may look complicated, the end result is simple.

“It’s ridiculously efficient … we can absolutely guarantee that that application will run successfully and effectively every single time,” Taouk said.

“And if an individual tool fails for whatever reason … you can re-run that one individual tool, re-hook the outputs into whatever the next program is, and keep going,” Yan added.

Sharing data to save lives

By implementing Docker into their big-data analysis pipeline, the Children’s Cancer Institute research team is able to share resources with other pediatric cancer research organizations across the world, and close to home.

Australian BioCommons and Cavatica have a specialized platform for developing pipelines with built-in support for Docker containers. Collaborating as part of the project, the Children’s Cancer Institute research team have the ability to access uploaded data, share their own data, and contribute to the continued development and growth of the platform.

“We wouldn’t have been able to do that if we hadn’t Dockerized our app. It just wouldn’t have been possible,” Yan stated.

One of the big benefits of Docker is that it has “massively increased the capacity of the pipeline,” said Yan. Having a flexible, scalable pipeline where “we can just double the memory; double the amount of data … change the instances freely to just double the capacity, triple the capacity” means more than speed and reduced costs. It means more children are able to access personalized cancer treatments.

Image: Wilfredor / CC0

Only high-risk children with low survival prognosis are currently able have their tumors personally profiled. Thanks to Docker, that is going to change. As part of the Zero Childhood Cancer Program, Yan and Taouk are working to bring Australia’s child cancer mortality rate to zero.

“In the future, we’ll actually be able to open up the program to every child in Australia that has cancer,” Yan concluded.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of DockerCon Live. (* Disclosure: TheCUBE is a paid media partner for DockerCon Live. Neither Docker Inc., the sponsor for theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Docker helps Australia cure cancer, one child at a time