Feeding the world with Big Data: The 3000 Rice Genome project
With the world’s human population set to rise from 7.125 billion now to more than 9 billion by the year 2050, scientists are turning to Big Data to ensure we can keep everyone well fed.
The 3,000 Rice Genome sequencing project has just made a new 120 TB dataset available on Amazon Web Services’ cloud, and is inviting researchers to lend their expertise to the effort.
According to the United National Food and Agricultural Organization, humanity will need to increase food production by up to 70 percent by 2050 if we’re to make sure the world’s expanding population has enough to eat. Rice, which is the staple food source for more than half of the world’s current population, is an obvious candidate for increased production. Rice accounts for over 20 percent of all human calorie intake per capita, but current methods for increasing rice yields are insufficient to meet our future needs, especially when taking into account the effects of pollution and climate change. What’s needed are newer, modern methods of breeding rice based on the underlying genetic data.
That’s what the 3,000 Rice Genome sequencing project is all about. The international effort is being undertaken by the Chinese Academy of Agricultural Sciences, BGI Shenzhen, and the International Rice Research Institute (IRRI) in collaboration with DNAnexus, and has seen them succesfully sequence the genomes of 3,024 rice varieties originating from 89 countries.
The project has accumulated over 120 terabytes of data encompassing more than 30 million genetic variations of the 3024 rice varieties, and this data has now been made available through AWS. The idea is that specialists and researchers from all over the world will be able to chip in, helping to identify and compare crop yields, climate stress tolerance, disease resistance and other important agronomic traits. This data will then be merged with other data to help scientists come up with predictive models and and modern methods of breeding and cultivating rice to provide for future generations.
The dataset lives on 37,000 compute cores on AWS’ infrastructure, and allowed the consortium to process the dataset 200 times faster than it would have been able to do with traditional computing infrastructure. Interested specialists are being invited to access the data on the 3000 Rice Genome Public Data Set page.
Interpreting the dataset will be a formidable challenge. The consortium is asking for help to systematically mine the 3000 Rice Genome dataset so it can link genotypic variations to functional variations and come up with new, more sustainable varieties of rice.
Image credit: Hbieser via pixabay.com
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU