What is EMC’s VFCache Good For?


EMC’s VFCache announcement Monday is a major departure from the vision of the flash memory startups such as Fusion-IO. The question is, what is VMCache good for, and is it more than just an attempt by EMC to freeze the market and maintain a place for its bread-and-butter hard disk array products?

EMC ironically pioneered flash in the data center when it added flash front-ends to its high-end Symetrix fibre channel disk arrays in 2008. The problem that flash is intended to address, as EMC President and COO Pat Gelsinger says in an interview with Wikibon Chief Analyst David Vellante, is the growing gap between the amount of data that the server CPU, which doubles in power every 18 months, can consume and the slow IO speed of spinning disk, which has not changed over two decades.

This put flash below the IO software stack. Since then several startups have moved flash into the server, putting it on the server PCIe (Peripheral Component Interconnect Express) bus, providing fast, random access read/writes combined with persistent memory to protect the data in case of a server crash or power failure. This left EMC behind as the figures show. While Gelsinger bragged that EMC sold 25 Pbytes of flash in 2011, Fusion-io CEO Dave Flynn responded that his company sold 50 Pbytes. And that was only one of several startups in the market.

VFCache is EMC’s bid to get into this market, and it is in large part a defensive move. Gelsinger inadvertently admitted as much when he said, “Our customers are saying that the combination is really fabulous, so I won’t go with Exadata or someone else.” (emphasis added)

But VFCache differs in important ways from the competition. First, while the startup flash vendors are providing full read/write functionality on the server, VFCache is a read-only cache. New data is written through the cache to the EMC storage array, which slows writes. FVCache is designed to work with the traditional storage stack, while Fusion-io’s vision (and that of the other flash statups) is of single tier storage with all data residing in flash on the server. EMC believes that this will be prohibitively expensive for much of enterprise data, and Gelsinger talks of “hot”, “warm”, and “cold” data and envisions a three-tier system.

“It’s not unusal to see 80% of the IOPS on 20% of the data,” says EMC CTO of Flash Products Dan Cobb. “So what about the other 80% of data.” Putting that on flash will be a cost-prohibitive choice.

Wikibon’s View

Wikibon tends to agree with EMC in this. Vellante in his recorded analysis says, “Flash will become the predominant medium for IO-intensive applications.” The relevant measure for these applications will not be the traditional cost-per-Gbyte but rather cost-per-IO. By this measure, flash is already less expensive for these applications and will “enable a new breed of applications that were once too expensive to justify based on the IO economics of spinning disk.”

But for a variety of reasons including compliance, companies need to maintain very large amounts of data that are not constantly active. Despite the continuing fall of flash prices, it is unlikely to drop below the cost-per-Gbyte price of disk, much less tape, in the foreseeable future.

Dennis Martin, president of Demartek, which conducted an independent evaluation of VFCache, said “The cost-effective approach is to use a small quantity of flash compared to total storage with automated tiering or caching solution….The cache fills up with hot data, making the access times significantly reduced.”

Datamark’s evaluation of VFCache running against a typical Oracle application found that populating the cache fully from the storage array took about an hour but resulted in a 2.6X to 3.3X increase in transactions-per-minute and also increased write speed to the underlying disk array, which was relieved of much of the read load by the VFCache. Martin suggests that this architecture is most valuable when used with read-intensive workloads with small IO block sizes of up to 64K, random IO workloads, and/or multiple IO streams. It would be less effective with write-intensive applications such as those capturing large amounts of transaction data for near-real-time analysis.

This, however, presumes an effective intelligent data management layer to move the most active data to the VFCache and then replace it as that data cools and newer data becomes more active. EMC does have that technology in the form of FAST (Fully Automated Storage Tiering) and Flash Cache, which, says EMC SVP of Flash Products Mark Sorenson, (http://siliconangle.tv/video/emc-launches-vfcache-pcie-flash-solution) it plans to extend to VFCache this year. EMC also promises high-performance deduplication for its flash product. Once this is available, EMC will offer unified multitier data storage from the server to the archiving layer with automated tiering. At present, however, FVCache is an immature product.