Twitter Open Sources Its Secret MySQL Hacks

twitter logo

Despite its reputation for being a big user of and contributor to open source NoSQL projects (including Hadoop, Cassandra, Storm and FlockDB), Twitter is highly dependent MySQL. According to a blog post from Twitter DBA and DB development team members Jeremy Cole and Davi Arnaut: “MySQL is the persistent storage technology behind most Twitter data: the interest graph, timelines, user data and the Tweets themselves.”

As you can imagine, Twitter has put a lot of work into making MySQL scale. And now the company has put its MySQL modifications on Github, under a BSD license, for all to use.

According to the blog entry, the work includes:

  • Add additional status variables, particularly from the internals of InnoDB. This allows us to monitor our systems more effectively and understand their behavior better when handling production workloads.
  • Optimize memory allocation on large NUMA systems: Allocate InnoDB’s buffer pool fully on startup, fail fast if memory is not available, ensure performance over time even when server is under memory pressure.
  • Reduce unnecessary work through improved server-side statement timeout support. This allows the server to proactively cancel queries that run longer than a millisecond-granularity timeout.
  • Export and restore InnoDB buffer pool in using a safe and lightweight method. This enables us to build tools to support rolling restarts of our services with minimal pain.
  • Optimize MySQL for SSD-based machines, including page-flushing behavior and reduction in writes to disk to improve lifespan.

The DBAs write “We look forward sharing our work with upstream and other downstream MySQL vendors, with a goal to improve the MySQL community,” so hopefully we’ll see some of these improvements rolled into official MySQL releases.

See also: how Facebook scales with MySQL.