UPDATED 10:41 EDT / MARCH 12 2013

Quancast’s Gift to Open Source: Room to Grow Big Data

Quantcast, by its own admission, has been dealing in Big Data since 2006 — before it was cool. Jim Kelly, VP Research and Development at Quantcast stopped by theCube during Strata last month to give some background on the Quantcast File System (QFS). As an alternative to HTFS and free to the open source community, Quantcast hopes to deliver better cost efficiencies at large scale to anyone who adopts it.

QFS started 5-6 years ago when Quantcast began innovating a lot of technologies internally to handle the volume they were getting.  Released in September 2012, it is a direct alternative to HTFS. A problem QFS is trying to fix is that Big Data sets tend to grow and have high operating costs. Power computing can quickly become a six- to seven-figure monthly operating expense.  So with QFS, a goal was to build a more efficient file system that makes better use of space.

QFS effectively doubles storage capacity of a Hadoop cluster compared to stock HTFS.

The #1 challenge in designing a distributive file system is fault tolerance. Software needs to tolerate bits of your data going missing. HTFS makes 3 copies. QFS uses read Reed-Solomon Encoding (same used in CDs, DVDs). Big space savings, 1.5 copies, so relative to HTFS it’s half.

QFS uses data slices and parody slices (six data slices and three parody slices) to nine separate places by default. If QFS can read any six, it can reconstruct the data. HTFS you can only lose two, thus QFS has a better fault tolerance too.

Here are some interesting factoids that show host Dave Vellante got Kelly to confirm during the interview as far as Quantcast numbers:

  • 50 terabytes of data in the door per day
  • avg. day process over 20 petabytes
  • 1000 machines (reasonably modest commodity hardware)
While he remained vague, Kelly said that Quantcast would measure success by the number of high quality collaborators that help extend the product together. File systems are an especially critical piece of the infrastructure puzzle. QFS stands to benefit from the scrutiny of open source, and Hadoop will benefit from having a file system that runs its framework. The giveback of QFS to open source is a win-win for all.

See Kelly’s full segment below.

http://youtu.be/3fXArMUBrrQ

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU