ARTICLE7 MINFOR CRO MANAGERSFOR GROWTH LEADERS

How much traffic do you need to A/B test?

How much traffic do you need to A/B test? It's the first question every team should ask and the one most skip. The honest answer isn't a round number — it depends on your baseline conversion rate, the size of the change you're trying to detect, and how confident you want to be when you call a winner.

◆ TL;DR

▸There is no universal traffic threshold. Required sample size is a function of your baseline conversion rate, your minimum detectable effect, and your significance and power targets.
▸Lower baseline conversion and smaller expected effects both push the required sample size up — often by a lot.
▸Most sites underestimate the number, start tests that can never reach significance, and then read noise as a result.
▸Size the test before you run it. If the math doesn't close in a reasonable window, that's a signal to do something other than test.

01The real answer

It depends on three inputs, not one.

When someone asks how much traffic you need to A/B test, they want a single number. The honest answer is that the number falls out of three inputs: your baseline conversion rate (what the control does today), your minimum detectable effect (the smallest improvement worth catching), and your statistical thresholds (typically 95% significance and 80% power). Change any one of those and the required sample size moves. A high-traffic page testing for a tiny lift can need more visitors than a low-traffic page testing for a large one.

▸Baseline conversion rate — the current rate you're trying to beat.
▸Minimum detectable effect — the smallest lift you care about detecting.
▸Significance (usually 95%) — your tolerance for a false positive.
▸Power (usually 80%) — your odds of catching a real effect that exists.

02Why the number is bigger than you think

Low baselines and small effects are expensive.

Two things quietly inflate the sample size. The first is a low baseline conversion rate: at 2%, you have far fewer conversions per thousand visitors to work with than at 20%, so you need more traffic to separate signal from noise. The second is a small minimum detectable effect: insisting on catching a 2% relative lift requires dramatically more traffic than being willing to ship only on a 10% lift. Halve the effect you want to detect and the sample size roughly quadruples. Most teams set both of these against themselves — low baseline, ambitious small lift — and then wonder why the test never resolves.

▸Lower baseline conversion → more traffic required.
▸Smaller minimum detectable effect → much more traffic required.
▸Detecting half the effect costs roughly four times the sample.
▸Two variants split your traffic; more variants split it further.

A test that can't reach significance in a reasonable window isn't a small test — it's a non-test. It will produce a number, but the number won't mean anything.

03The consequence

Underpowered tests don't fail loudly — they mislead.

The damage from underestimating traffic isn't that the test errors out. It's that it runs, shows an early swing, and tempts you to call it. Peeking at an underpowered test and stopping on a hopeful day is how teams ship changes that do nothing — or quietly hurt. The fix isn't more discipline at the finish line; it's honest math at the start. Decide the required sample size and duration before launch, and commit to running to it.

▸Underpowered tests still display a winner — it's just noise.
▸Early swings tempt premature calls and false confidence.
▸Calling tests early inflates your apparent win rate and erodes real results.
▸Pre-committing to sample size and duration removes the temptation.

04What to do about it

Size the test first — then decide whether to run it.

Run the math before you build the variant. If your weekly traffic and baseline conversion put a meaningful test within a few weeks, go. If the honest sample size is months away, that's not a reason to lower your standards — it's a reason to do different work. Drive more qualified traffic, ship the obvious UX fixes that don't need a test to justify them, and come back to experimentation when the page can actually carry a test. Flight Path inside Optimize Pilot makes that call for you: it detects when a page is too quiet to test and routes you to growth work until the traffic is there.

▸Estimate required sample size and duration before building anything.
▸If the test resolves in weeks, run it; if it's months out, redirect the effort.
▸Low traffic is a reason to grow traffic, not to ship untested guesses.
▸Let test-readiness gate the queue, so you only run experiments that can finish.

◆ KEEP GOING

A/B Test Sample Size Calculator →Plug in your baseline conversion rate, minimum detectable effect, and weekly traffic to get the required sample size per variant and an estimated duration.
Flight Path →See how Optimize Pilot detects when a page is too quiet to test and routes you to traffic work first — then unlocks experimentation when the math closes.

Related for your role

ALL RESOURCES →

CALCULATOR

Stack Consolidation ROI Calculator

Enter what you pay Optimizely, Crayon, Hotjar, and Ahrefs today. See what Optimize Pilot would cost instead — and how many headcount the delta covers.

Run the numbers →

CALCULATOR

A/B Test Sample Size Calculator

Enter your baseline conversion rate, minimum detectable effect, and weekly traffic. Get the required sample size per variant and an estimated test duration.

Size your test →

PLAYBOOK18 MIN

The Testing Velocity Playbook

How high-performing CRO teams ship more experiments without sacrificing statistical rigor. Includes the idea-to-ship workflow we see work in practice.

OPEN →

◉ DO THE MATH FIRST

Size the test. Then decide.

Run your numbers through the sample size calculator before you build a variant. If the test can't finish, Flight Path will tell you what to do instead.

Book a 15-min stack audit →