Julien Simon is the vice president of engineering for Criteo, a French firm that works with retailers to serve personalized banner ads to consumers. He recently stopped by theCube to discuss how his company uses MongoDB to turn petabytes of data into clicks and conversions.
Simon tells co-hosts Dave Vellante and Jeff Kelly that Criteo utilizes 10gen’s database to store product catalogs from over 3,000 major e-commerce sites. The executive sees MongoDB as the “starting point” of Criteo’s environment because it handles the data that’s fed into his company’s homegrown analytical platform. This latter layer consists of several components, including real-time algorithms that identify whether the potential value of certain ad space merits its price.
Simon boasts that Criteo recorded a 200 percent increase in revenue over the last five years. Today, the company serves clients in over 30 markets and maintains offices in 15. It also operates seven data centers, including three in Europe, two in the United States, and another two in Japan.
Criteo deployed MongoDB in early 2011 after its old Microsoft SQL Server environment started struggling under the weight of retailers’ growing datasets. Simon says that Criteo evaluated a number of NoSQL databases before it opted to go with 10gen’s, which proved to be simpler and more scalable than the alternatives.
Scalability is a big deal for Criteo. The heart of the company’s environment is a multi-petabytes Hadoop cluster that grows at a rate of 20 terabytes a day. Simon explains that the system evaluates every single ad impression in order to optimize user engagement.
Criteo uses different data stores to handle analytical and transactional workloads. Dave highlights that a number of vendors are working on databases that can handle both, but Simon dismisses these products as mere “attempts.” In his view, the industry still has a lot of catching up to do.
Click the video below for the full interview.