UPDATED 12:00 EST / JULY 16 2020

github-archive-progam BIG DATA

GitHub preserves its open-source software code deep in the arctic for future generations

GitHub Inc. said today it has delivered a copy of all of the open-source software code stored on its website to a data repository at the Arctic World Archive, which is a very long-term archival facility buried 250 meters deep in the permafrost of an Arctic mountain.

The operation is part of the GitHub Archive Program, which is a project announced last year that aims to preserve today’s open-source software for future generations. To do that, GitHub said, it will store its code in an archive called the GitHub Arctic Code Vault, which it says has been built to last for a thousand years.

GitHub said it carried out the operation in partnership with a long-term data storage company called Piql, which copied the entire contents of its active public repositories and wrote that data to 186 reels of hardened microfilm. The microfilm was then shipped to the island of Svalbard in Norway, which is located inside the Arctic Circle, and transported to a decommissioned coal mine set within a mountain that’s now home to the Arctic World Archive.

Once there, the encoded microfilm was placed inside the GitHub Arctic Code Vault, which is a deep chamber that’s buried inside hundreds of meters of permafrost.


The operation is still far from complete, though. In a blog post, GitHub said the next step is to create what it calls a “Tech Tree,” which is a document that contains the technical history and cultural context of the GitHub Archive Program.

The idea with the Tech Tree is to compile a bunch of existing works that help provide a more detailed understanding of modern computing and software development, open-source software and its applications, and popular programming languages. The Tech Tree will also explain the various technologies that make software possible, including such things as microprocessors, networking, electronics and even pre-industrial technologies. That, GitHub said, is intended to allow the archive’s inheritors to understand today’s world and its technologies better and perhaps even enable them to recreate computers to use the archived software.

To recognize the millions of developers who have contributed to the open-source software that’s now stored in the vault, GitHub has also created a new badge that will be displayed in the highlights section of each user’s profile.


The archive program is actually just one of several initiatives GitHub has launched to try and preserve the open-source software code it hosts. In addition to that project, the company is also working with a nonprofit organization called the Internet Archive, which provides free public access to various collections of digitized content.

GitHub said the Internet Archive began archiving the content from its public software repositories in April this year, using its Wayback Machine to archive the raw data as Web ARChive files. To date, it has archived more than 55 terabytes of code, GitHub said.

In addition, The Internet Archive is planning to make those archived repositories available via a “git clone” that will also preserve things such as repo comments, issues and other metadata, and make it easily accessible via the internet. That initiative is “well underway” and initial archiving is expected to start this month, GitHub said.

In addition, GitHub is partnering with another nonprofit group, called the Software Heritage, to preserve and share the source code of its software commons. Software Heritage has already archived more than 130 million different projects, and 100 million of those came from GitHub, the company said. The Software Heritage archives can be accessed now from here.

GitHub further announced a partnership with Project Silica, which is a project that aims to develop a sustainable and reliable, long-term storage technology for long-lived data.

Project Silica is using new techniques in ultrafast laser optics to store data in fused quartz glass, using a process that permanently changes the physical structure of the material. Quartz glass is a durable storage media that offers “unparalleled data lifetimes of upwards of tens of thousands of years,” GitHub said. It can resist electromagnetic interference, water and heat as well, it said.

Images: GitHub

Since you’re here …

Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!

Support our mission:    >>>>>>  SUBSCRIBE NOW >>>>>>  to our YouTube channel.

… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.