

Quantcast, by its own admission, has been dealing in Big Data since 2006 — before it was cool. Jim Kelly, VP Research and Development at Quantcast stopped by theCube during Strata last month to give some background on the Quantcast File System (QFS). As an alternative to HTFS and free to the open source community, Quantcast hopes to deliver better cost efficiencies at large scale to anyone who adopts it.
QFS started 5-6 years ago when Quantcast began innovating a lot of technologies internally to handle the volume they were getting. Released in September 2012, it is a direct alternative to HTFS. A problem QFS is trying to fix is that Big Data sets tend to grow and have high operating costs. Power computing can quickly become a six- to seven-figure monthly operating expense. So with QFS, a goal was to build a more efficient file system that makes better use of space.
QFS effectively doubles storage capacity of a Hadoop cluster compared to stock HTFS.
The #1 challenge in designing a distributive file system is fault tolerance. Software needs to tolerate bits of your data going missing. HTFS makes 3 copies. QFS uses read Reed-Solomon Encoding (same used in CDs, DVDs). Big space savings, 1.5 copies, so relative to HTFS it’s half.
QFS uses data slices and parody slices (six data slices and three parody slices) to nine separate places by default. If QFS can read any six, it can reconstruct the data. HTFS you can only lose two, thus QFS has a better fault tolerance too.
Here are some interesting factoids that show host Dave Vellante got Kelly to confirm during the interview as far as Quantcast numbers:
See Kelly’s full segment below.
THANK YOU