UPDATED 14:00 EDT / JULY 03 2013

Balancing Design Instinct + Judgment vs. A/B Testing for Design Changes

Editors Note: The following is an excerpt from Joel Lewenstein, Product Designer at Quora, in response to this question: How do designers at Twitter, Facebook, and other tech companies balance their design instinct and judgment vs. the engineer’s impulse to use A/B test data for every design change? on Quora as told by Ryan Cox.

For additional reference, the question was explored specifically to environments where every pixel change, line height, shades of color, etc. are A/B tested and most decisions are made on Big Data. As designers, how do you navigate these environments in relation to Big Data? Can it make you doubt your design judgement, or doubt the effectiveness of Big Data? How do you get across the idea that design is more than the sum of it’s parts?

This is a great question, one that’s becoming especially relevant as more companies (like Quora [1]) take a data-driven approach to product development. At it’s core, this isn’t a question about “Designers vs Engineers” (though they may be the stereotypical vanguards of data-driven and intuition-driven decision making), but rather a more general one: How do you make product decisions using both empirical data (statistics, usage, test results) and qualitative data (prior experience, intuition, beliefs about human nature)?

It helps to think about this tension in two buckets: one where data confirms intuition, and one where it doesn’t. To make this concrete, imagine a new feature being designed for a product, which is first A/B tested to a subset of users. Team members make predictions about the outcome of the test, based on their knowledge and experience, then compare those intuitions to the data. Simplifying slightly, here are four potential scenarios:

In half the scenarios (I and IV), intuition and data agree: The people working on the feature predict that there will be positive or negative impact, and the data supports that prediction. [2] There are a number of meaningful benefits to these scenarios that are worth calling out explicitly, as they set the stage for the more difficult scenarios of disagreement.

Intuition is built from data.

The fabric of qualitative predictions, intuition and gut instinct, is woven from threads of concrete data findings. A theory of human behavior (e.g. “People respond to short, simple language”) might one day be part of your knowledge and experience. But forming that intuition is best done through a series of rigorous, focused tests. Even confirming existing theories through data helps build confidence in what you think you know, exposes nuance, and confirms that they hold true across multiple contexts.

Quantifying impact helps make tradeoffs.

Even the best qualitative theory rarely includes predictions about degree of impact. Testing and confirming improvements results in an outcome that can be weighed against other potential changes, and balanced against the work necessary to build and support that change.

Confirming impact keeps you sane.

The question asks this specifically, and it’s true that having your intuition challenged by data is confidence-shaking, so it’s worth calling out the converse here: Having data confirm your prediction feels good, gives you faith in your gut, and can help power you through the WTF!? moments that future tests will surely deliver.

When data confirms intuition, it’s an easier outcome to feel good about and move forward with. Harder to reconcile is when the data disagrees with intuition, i.e. Scenarios II and III. Underneath the inherent frustration and bewilderment that comes with these outcomes, there can be tremendous learning opportunities:

Unexpectedly bad data forces a reconsideration of assumptions.

When I encounter an Intuition False Positive, my first instinct is always to reexamine every assumption baked into a design. Why exactly did I believe in this change? What theory underlies that prediction? What are all the steps I’m expecting will happen, and are they actually happening? Are people not starting a flow? Not finishing it? Using it perfectly, but cannibalizing usage from another part of the site? Design is surely “more than the sum of its parts”, but if a design isn’t working, it’s often one of the parts that is broken.

Unexpectedly good data exposes new opportunities for impact.

When I find myself in an Intuition False Negative, it’s usually because I’ve overlooked a potentially meaningful area of impact. I may not believe that a language change, or button color, or new UI adornment will have any impact, but when it does, it forces me to consider what else I’ve overlooked. What are the biggest dead ends in a flow? What seems simple to me but is actually confusing to new users? What information might people need in this scenario that they don’t have? In short, what are all the possible pressure points that can be moved to achieve the goals of a product. Really thoughtful considering of these results, without biases or assumptions, creates fertile ground for brainstorming new ideas with potentially high impact.

Every time a product change produces unexpected data, I learn an enormous amount about the specific feature that’s been built, how people use a product, and future opportunities for impact. Ultimately these learnings create more future occurrences of Confirmed Intuition. It can be a painful road, as any challenging learning opportunity is, but the best lessons often come from these moments [3].

To really reap the benefits of this data-driven approach to product development, (at least) two things are critical:

Test Falsifiable, Named Hypotheses

To learn from data and strengthen intuition, it’s critical to be able to draw lessons from results: if data says that “The test group performed 5% more key actions than the control group”, what abstract learning can be drawn from this? Having this concise takeaway is critical for using these results in the future. Test groups should be conceived to test as few variables as possible, and those variables should have clear names: “I think people will sign up for a service that seems affordable, so I’m going to put our price information on the homepage.” If signups increase, then the lesson should be that pricing information increases conversion [4].

If the test is “I’m going to try a bunch of different line-heights to see which one converts best,” and one of them does, you’re not left with a useful lesson: Is “line-height: 1.85em converts best” really a finding you can extrapolate to other scenarios? What is the hypothesis for why this is true? Is it even true at all? (If you’re looking for a 95% confidence interval, 1 out of 20 tests will show good results without there being a real difference). Without thoughtful test design, your confounding results won’t strengthen your intuition moving forward, and instead will paralyze you by wanting to reap the benefits of a test that you don’t really understand.

Accept that Testing and Product Development are Non-Linear

A concern with frequent A/B testing of small iterations is that implementing each positive change will lead to a piecemeal, incoherent design, that doesn’t respect (in the words of the question) that “design is more than the sum of its parts.” This is surely a concern, but only if test results are considered dogma to be used blindly, without consideration of context. Much better is to see data as information and guidelines to be incorporated into future, thoughtful iterations. Specific test results can inform theories and conversations about behavior, show the high water mark of impact for a potential change, and provide a framework for making tradeoffs between possible product directions. This protects the informative power of data, while maintaining consistent, thoughtful, holistically considered experiences for users.

Data-driven product development, even practiced perfectly, is an emotional roller coaster: The confidence-inspiring highs of clear impact is matched only by a bewildering negative impact that questions everything you thought you knew about the world. But if tests are well-designed explorations of a theory, results are used as moments of learning and improvement, and the eventual feature released is informed and not dictated by these findings, the result is both a product and product designer who is better for the process.

[1] Quora is a deeply data-driven company: We have a team of data scientists who deliver nuanced analyses of tests and usage patterns, a group of engineers and designers who believe in using that data thoughtfully, and a company culture of relying on data rather than ego, politics, or emotional investment.

[2] In practice, Scenario I is far more likely than Scenario II: If no one predicts the positive impact from a change, it probably won’t get built.

[3] For a funny and honest look at the experience, see Denial, anger, bargaining, depression, and acceptance: The A/B testing lifecycle.

[4] With the obvious caveat that you’d want further and different tests to confirm the source of these results.

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Balancing Design Instinct + Judgment vs. A/B Testing for Design Changes

Intuition is built from data.

Quantifying impact helps make tradeoffs.

Confirming impact keeps you sane.

Unexpectedly bad data forces a reconsideration of assumptions.

Unexpectedly good data exposes new opportunities for impact.

Test Falsifiable, Named Hypotheses

Accept that Testing and Product Development are Non-Linear

[2] In practice, Scenario I is far more likely than Scenario II: If no one predicts the positive impact from a change, it probably won’t get built.

[3] For a funny and honest look at the experience, see Denial, anger, bargaining, depression, and acceptance: The A/B testing lifecycle.

[4] With the obvious caveat that you’d want further and different tests to confirm the source of these results.

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Balancing Design Instinct + Judgment vs. A/B Testing for Design Changes

Intuition is built from data.

Quantifying impact helps make tradeoffs.

Confirming impact keeps you sane.

Unexpectedly bad data forces a reconsideration of assumptions.

Unexpectedly good data exposes new opportunities for impact.

Test Falsifiable, Named Hypotheses

Accept that Testing and Product Development are Non-Linear

[2] In practice, Scenario I is far more likely than Scenario II: If no one predicts the positive impact from a change, it probably won’t get built.

[3] For a funny and honest look at the experience, see Denial, anger, bargaining, depression, and acceptance: The A/B testing lifecycle.

[4] With the obvious caveat that you’d want further and different tests to confirm the source of these results.

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies