As Big Data gets even bigger, we’re going to need more elaborate ways of storing it than current technology allows. One idea that’s been floated about before is using DNA as a means of storing data, but it’s no longer just a theory – scientists at Harvard University say that doing so is a foregone conclusion, and it’ll soon be possible to store the entire content of the entire World Wide Web within just 75 grams of DNA material.
Led by George Church and Sri Kosuri at Harvard University’s Wyss Institute, a team of researchers have recently figured out a way of storing a whopping 5.5 petabytes (equal to 700 terabytes) within a single gram of DNA. The method they use is kind of technical, but the scientific journal IndustryTap does a good job of explaining how it’s done:
“Rather than enciphering binary data on magnetic drives, scientists are leveraging strands of DNA to microcode data. DNA, capable of storing 96 bits, is synthesized with each of the TGAC bases representing a binary value: T & G representing 0 and A & C representing 1). Parsing data stored in DNA is as simple as using existing sequencing apparatuses and converting or distilling each base back to binary code.”
“DNA is very dense and can store one bit per base with each base only a few atoms large. DNA is also volumetrical meaning it is stored in a beaker or other incurvation rather than a hard disk. Finally while some advanced storage systems need to be kept in subzero vacuums, DNA can be stored in a box in your house.”
One problem that remains to be overcome is the technological side of things – it’s currently not easy nor cost-effective to store or read data in this way. However, technology moves fast, and with labs on chips and Advanced Micro Fluids progressing to the point where its possible to analyze the human genome within just a few hours. As science advances, the team at Harvard believes it won’t be long before it becomes its possible to store 100 million hours of high definition video in a single cup of DNA, a capability that can’t come soon enough with the huge amounts of data that the Internet of Things.
DNA sequencing in the cloud
Ironically, Big Data can also help to advance our understanding of DNA. With the ability to manage incredibly large data sets, we’re able to better understand our own genetic makeup. DNA sequencing is now a viable area of medical research, but until recently, scientists lacked the computing resources to push forward. Now however, a number of startups are offering cloud services specializing in genomic research to help them do just that.
One of the best examples of this is a genetic research program at Baylor University called the ‘Cohorts for Heart and Aging Research in Genomic Epidemiology’, or ‘CHARGE’, which aims to identify genes linked to a higher risk of heart diseases.
Their project involves both whole genome sequencing of dozens of individuals, not too mention population research, and this churns up far more data than researchers know what to do with. To get a grip on their numbers, the team sought the help of a PaaS startup called DNA nexus, sequencing the DNA of more than 14,000 individuals, including 3,751 whole genomes and 10,771 exomes, an operation that took some 2.4 million core-hours of computational time. By using the cloud to run the data, CHARGE’s researchers were able to perform this work 12 times faster than if they’d used their own computers, and produced a massive 430TB of data that is now being analyzed in their hunt for genetic links to heart diseases.
“Many large-scale population studies to date have been limited in scope by a lack of the necessary compute power; this is a real hindrance in realizing the full promise of genomic medicine,” said Richard Daly, CEO of DNAnexus.
Big Data research projects such as CHARGE seem to be one of the most compelling use cases for cloud computing. It might still be some way off, but with the help of companies like DNA nexus, the full promise of genomic medicine is finally within our grasp.