How to Write an AB Testing Hypothesis That Actually Drive Impact

Predicting the result before the test runs isn’t rigour. It’s confirmation bias with a confidence interval attached.

Most e-commerce teams have a hypothesis template. It usually looks something like this: “We believe that [change] will cause [result] because [reason].” It has the right shape. It references data. It proposes a change. And it is almost a hypothesis, which is precisely why it is so damaging.

The moment you state the expected result in your AB testing hypothesis, you have introduced confirmation bias before a single user has seen your variant. The team is now looking for evidence that the predicted outcome occurred, not looking honestly at what the data shows. Results that confirm the prediction get accepted. Results that contradict it get interrogated. That is not experimentation. It is storytelling dressed in the language of science.

What an AB Testing Hypothesis Is Actually For

A hypothesis is not a prediction.

It is a structured argument for why a change is worth testing. Its job is to force intellectual honesty before the test runs, to make the team articulate what problem they are solving, why they believe the proposed change addresses it, and what they expect to learn regardless of the outcome.

The distinction matters because it changes what a “failed” test means. If your hypothesis predicts a result and the result doesn’t materialise, you learn only that the prediction was wrong. If your hypothesis poses a solution rooted in observed customer failure and the test doesn’t win, you learn that the solution didn’t address the problem in the way you expected, which is genuinely useful information that sharpens the next iteration.

Good hypothesis construction is the difference between an experimentation programme that accumulates knowledge and one that accumulates activity. For lean e-commerce teams with limited traffic and limited resource, that distinction is a commercial one.

The Problem With ‘We Believe This Will Improve Conversion’

The standard hypothesis format, “we believe X will cause Y”, conflates three things that should always be kept separate: the observation, the inference, and the proposed change.

The observation is what your data shows is happening, stated as fact. The inference is the logical connection between the observation and the proposed change, the intellectual work that explains why the change is relevant to the failure. The proposed change is what you are testing. These are three distinct steps, and collapsing them into a single “we believe” statement skips the most important one: the inference.

Consider the difference between these two framings:

“We believe that improving the visibility of PLP filters will reduce drop-off.”

And:

"We know from session data that 43% of mobile users on our PLP scroll fewer than two product rows before leaving, and from exit survey data that 31% of abandoners cite ‘couldn’t find what I was looking for’ as the reason. Interaction with PLP filters is associated with increased scroll depth and greater click-through rate. We should therefore seek to improve the visibility of, and encourage engagement with, these filters."

The first version is a guess. It names a change and assumes a result. The second version is an argument. It states what the data shows, draws a logical inference from it, and proposes a change without presuming what the test will find. One is a prediction. The other is a question worth answering.

The Five Principles of a Good AB Testing Hypothesis

A well-constructed AB testing hypothesis in e-commerce meets five criteria. Each one is doing a specific job.

It must be backed by data and insight.

A hypothesis is only as strong as the evidence it rests on. Behavioural data, session recordings, on-site engagement metrics, exit surveys, post-purchase feedback, these are the sources that surface real customer failure. A hypothesis built from ‘we think’ or ‘best practice suggests’ is not backed by data. It is backed by assumption.

It must address the customer’s objectives and challenges, not those of the business.

The business wants higher conversion. The customer wants to find the right product, understand whether it meets their needs, and feel confident enough to buy. These are not the same thing. A hypothesis framed around the customer’s failure, what they are trying to do that the current experience prevents, will produce tests that actually move commercial performance. A hypothesis framed around the business’s KPIs will produce tests that optimise the appearance of performance without addressing its root cause.

It must pose a solution but not propose the impact.

State what you are testing. Do not predict what will happen. The test exists precisely because you do not know the answer. Predicting the outcome in the hypothesis pre-loads the team with a result to confirm, and makes inconclusive or negative findings feel like failures rather than data.

It must be measurable.

The hypothesis should name the metric or metrics that will tell you whether the change addressed the failure. Not ‘conversion will improve’, that is an outcome, not a measure. But ‘scroll depth on mobile PLP’, ‘filter engagement rate’, ‘add-to-cart rate from PLP on mobile’, these are specific, measurable signals that connect directly to the failure the hypothesis identifies.

It must open the door for further experimentation.

A good hypothesis, whether the test wins or loses, should generate the next question. If improving filter visibility increases scroll depth but not add-to-cart rate, the programme now knows something new: the discovery problem is not the only barrier. That’s a more valuable outcome than a positive result that tells you nothing about why it worked.

What AB Testing Statistics Actually Tests: Null and Alternate Hypotheses

There is a second layer to hypothesis construction that most e-commerce teams never engage with, because it is treated as the statistician’s problem rather than the team’s. Understanding it changes how you write hypotheses and how you interpret results.

When an AB test runs, the statistical engine is not testing whether your proposed change works. It is testing two competing propositions simultaneously.

The null hypothesis is the assumption that there is no difference between the control and the variant, that any observed difference in conversion rate, scroll depth, or add-to-cart rate is the result of random variation rather than the change you made. The null hypothesis is the default position. It is what the test assumes to be true until the data says otherwise.

The alternate hypothesis is the proposition that a real difference exists, that the change you made produced an effect that is unlikely to be explained by chance alone.

Statistical significance, the 95% confidence threshold most teams use, means that if the null hypothesis were true, you would see a difference this large or larger by chance only 5% of the time. Reaching significance means the data is inconsistent enough with the null hypothesis that you can reasonably reject it in favour of the alternate.

This has a direct implication for how you construct your AB testing hypothesis. If you write your hypothesis as a prediction of outcome, “this change will increase conversion”, you are framing it as an alternate hypothesis. The test will tell you whether to accept or reject that prediction, and nothing else. But if you write your hypothesis as a structured observation and inference, rooted in a specific customer failure and a logical argument for why the proposed change addresses it, the test becomes genuinely informative regardless of which hypothesis the data supports.

A failing test that rejects the alternate hypothesis is not a waste if the hypothesis was well-constructed. It tells you that the intervention didn’t address the failure in the way you expected, which refines your understanding of the problem. A passing test on a poorly constructed hypothesis tells you only that something changed, not why, not whether it will hold, and not what to do next.

Why Poor Hypotheses Produce Expensive Experimentation Programmes

The commercial cost of weak hypothesis construction is not visible test by test. It accumulates. An experimentation programme built on ‘we believe X will improve Y’ hypotheses generates a pipeline of tests that may individually win or lose, but collectively tell the team very little about why customers are failing to buy.

Results that confirm predictions get released and reported upward. Results that don’t get quietly retired. The underlying failure patterns, the real reasons customers abandon at product page, at basket, at checkout, remain undiagnosed, because the programme was never designed to surface them. What you are left with is a testing programme that is busy but not diagnostic. Academic wins. Activity without commercial progress.

For lean e-commerce teams, where every test consumes meaningful resource and traffic is finite, this is not a theoretical problem. A poorly framed hypothesis that runs for four weeks on insufficient traffic, reaches no significance, and tells you nothing is four weeks of opportunity cost. The question you should have been asking, the specific customer failure that your data was pointing at, went unaddressed.

Building an AB Testing Hypothesis Framework That Learns

The shift from prediction-based to evidence-based hypothesis construction is not complicated. It requires three things: a clear articulation of what the data shows (the observation), an honest argument for why the proposed change is relevant to that failure (the inference), and a deliberate decision not to state the expected outcome (the discipline).

Applied consistently, this approach transforms what an experimentation programme produces. Tests that win tell you which solutions address specific, evidenced customer failures. Tests that lose tell you that the failure is real but the solution needs refining. Both outcomes advance understanding. Neither is wasted.

An AB testing hypothesis framework built on these principles is the foundation of a testing programme that actually moves revenue. Not because it wins more tests, but because it asks better questions, and in digital experimentation, the quality of the question determines the value of the answer.

Our mission

To combine expertise in data, insight and the scientific method, working with ambitious digital organisations to challenge, inform and support teams deliver the greatest commercial impact from every investment in digital channels.

Our mission

To combine expertise in data, insight and the scientific method, working with ambitious digital organisations to challenge, inform and support teams deliver the greatest commercial impact from every investment in digital channels.

Our mission

To combine expertise in data, insight and the scientific method, working with ambitious digital organisations to challenge, inform and support teams deliver the greatest commercial impact from every investment in digital channels.