MongoDB Brings Speed, Scalability, and Performance to Handling Unstructured Data #mongodbdays


Eugene Dvorkin of WebMD discussed his company’s integration of MongoDB into their infrastructure and development process with theCUBE co-hosts Jeff Kelly and Dave Vellante, live at the MongoDB Days conference in New York City.

WebMD’s drive to bring MongoDB in was its scalability and performance characteristic, especially sharding. Traditionally scalability, Dvorkin explained, is vertical, scale-up. It’s become very expensive to find service and support for this scalability approach. MongoDB offers horizontal scalability, “as you grow, you buy cluster servers and you grow horizontally almost indefinitely.” “We don’t have a supplication architect working for enterprises, it’s solved by the vendor,” Dvorkin said, talking about the collaboration with 10gen which handles that through MongoS.

Another reason to choose MongoDB is the fact it’s very developer friendly, it is “much easier to code against MongoDB as compared to relational databases,” Dvorkin added.

“We have unstructured data” acquired by tracking events in social media (likes, shares, etc).  “We capture it and want to have some use of it. It is Big Data, we get a lot of data, but don’t know what to do with it at first,” Dvorkin explained. Thus there is a need for storage that can take big volumes of unstructured data, and in that respect MongoDB is very valuable. “No throwing, no deletes, no updates, just inserting and keeping,” he explained the company’s process.

Designing for data


Explaining the company’s setup, Dvorkin explained that they use MongoDB, with Oracle, Vertica, and  Storm. For analytical purposes, “we move data from Mongo into data warehouses. For reporting, it’s easier to work with databases like Oracle or Vertica.” Storm is a real time engine to move data around. “It’s similar to Hadoop,” but better equipped for real time.”

Explaining the apps the company provides based on the data they gather, Dvorkin said it provided a communication platform that was “serving ads based on real time user events, events that the user generates.” Once an event occurs, a rule managing the action to take is executed it works by executing rules. “To make it fast, we use MongoDB,” as it has excellent scalability and query support. For BI analytics, Storm comes in, “on the fly, in near real time, we transfer data from MongoDB to Vertica.”


Explaining how MongoDB is developer friendly, Dvorkin said that “in general, it’s less code.” App development with relational databases is done with object oriented languages. Objects are hard to translate to relational databases, it can be achieved with frameworks like Hadoop, but it’s “still massive knowledge to use it, big chunk of work.” MongoDB stores documents as objects natively. Devs don’t have to transform from object-oriented languages into what the storage supports. Thus it “removes a whole layer of complexity.”

In the NoSQL database space in particular, the document structure base fits well with object oriented development. Moreover, MongoDB  offers “very rich query capabilities, rich data structure, indexable and searchable.”

On mobile

Asked how mobile affects his work, Dvorkin said mobile generates a lot more traffic, and this trend will continue, thus WebMD releases a lot of mobile apps. The main challenge is testing for mobile, as there are different operators, devices, operating system, there’s a growing need for testing automation. Tools for testing are becoming better and there may be a solution in testing as a service offerings, especially in the performance space.