This is a great move for Facebook, developers, and the data community. As we talk about on SiliconANGLE Wikibon and theCUBE is that Data science is the hottest emerging trend that is creating new value that has never been seen before.
This is a great preview of the upcoming F8 conference on April 30th in SF.
Here is the Facebook news:
When people think about the tools of data science, they often focus on machine learning, statistics, and data manipulation. Modeling massive datasets is indispensable for making predictions — like predicting which set of News Feed stories or search results are most relevant to people. But such models also have limitations in terms of their ability to help with understanding cause-and-effect relationships that lead to building better products and to advancing behavioral science.
Data science needs better tools for running experiments
Despite the abundance of experimental practices in the Internet industry, there are few tools or standard practices for running online field experiments. And existing tools tend to focus on rolling out new features, or automatically optimizing some outcome of interest.
At Facebook, we run over a thousand experiments each day. While many of these experiments are designed to optimize specific outcomes, others aim to inform long-term design decisions. And because we run so many experiments, we need reliable ways of routinizing experimentation. As Ronald Fisher, a pioneer in statistics and experimental design said, “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.” Many online experiments are implemented by engineers who are not trained statisticians. While experiments are often simple to analyze when done correctly, it can be surprisingly easy to make mistakes in their design, implementation, logging, and analysis. One way to consult a statistician in advance is to have their advice built into tools for running experiments
PlanOut: a framework for running online field experiments
Good tools not only enable good practices, they encourage them. That’s why we created PlanOut, a set of tools for running online field experiments, and are sharing an open source version of it as part of the Data Science Team’s first software release.
Importantly, PlanOut gives engineers and scientists a language for defining random assignment procedures. Experiments, ranging from simple A/B tests, to factorial designs that decompose large interface changes, to more complex within-subjects designs, can be expressed with only a few lines of code. In this way, PlanOut encourages running experiments that are more akin to the kind you see in the behavioral sciences.
Logging can also be a pain-point for many experiments. Logging is often considered separate from the randomization process, which can make it difficult to keep track or even define exactly which units (e.g., user accounts) are in your experiments. This is especially problematic if engineers change experiments by adding or removing additional treatments mid-way through the experiment. PlanOut helps reduce these kinds of errors by automatically logging exposures, and providing a way of tying outcomes to experiments. Finally, this kind of exposure logging increases the precision of experiments, which leads to lower false negatives (“Type II errors”).
A single experiment is rarely definitive. Instead, follow-on experiments might need to be run as development of a new product takes place, or as decision-makers evaluate the effects of a major product rollout. While it is fairly straightforward to run a single experiment, there are few established protocols for how follow-on experiments should be run, and how data for these experiments should be logged. PlanOut includes a management system that organizes experiments, and the parameters they manipulate, into namespaces. This allows distributed teams to work on related features, and launch follow-on experiments in a way that minimizes threats to validity.
The code for PlanOut is now live on GitHub. You can find out more by visiting the PlanOut homepage on Github
We welcome developers and researchers in the open source community who are interested in contributing directly to the original repository, or by forking it.
You can read more about PlanOut “Designing and Deploying Online Field Experiments by Eytan Bakshy, Dean Eckles, Michael Bernstein. “, which we will be presenting at this year’s ACM International Conference on the World Wide Web (WWW 2014):
Latest posts by John Furrier (see all)
- Analysis of the State of OpenStack – OpenStack Silicon Valley - August 9, 2016
- IO data center as a platform conference coming Sept. 13 in Menlo Park - August 9, 2016
- Clarfication on my comments on Cloudera on theCUBE at #HadoopSummit - April 18, 2016