The experiment brief template.
Every test should start from a one-page brief. Hypothesis, variants, sample size, success metric, guardrails. This template removes every ambiguity that causes tests to stall mid-flight.
- ▸One page. No more. A two-page brief is a symptom of fuzzy thinking.
- ▸The hypothesis section is 80% of the brief's value. Get it right.
- ▸Pre-commit to sample size and analysis plan before the test runs. Otherwise you'll p-hack unconsciously.
- ▸Guardrail metrics prevent wins on paper that are losses in reality.
The most important section.
The hypothesis is what separates a disciplined experiment from a random change. Format: 'We believe [change] will cause [metric] to move by [amount] because [reason].' If you can't write it in that format, you don't understand the test yet.
- ▸Template: 'We believe [change] will cause [metric] to [direction + magnitude] because [reason].'
- ▸Example: 'We believe adding social proof to the pricing page will increase trial signups by 8-15% because current visitors drop off at the same scroll depth as our competitor benchmarks.'
- ▸If the hypothesis doesn't fit the template, rewrite it before going further.
What you're actually changing.
Specify the control and every variant explicitly. Include screenshots or Figma links. Write down what is changing and what is staying the same. The most common test failure mode is 'we thought we were testing X but actually tested X+Y'.
- ▸Control: exact current state (screenshot + URL).
- ▸Variant(s): each change, explicitly enumerated.
- ▸Traffic split: typically 50/50 for two-variant, 33/33/33 for three.
- ▸What is NOT changing (explicit list to prevent scope creep).
Primary, secondary, guardrail.
Every experiment needs three categories of metrics. Primary (what you're optimizing), secondary (what you're watching), guardrail (what would cause you to halt the test even if primary is winning). Skipping guardrails is how teams ship wins that hurt the business.
- ▸Primary: one metric. Pre-committed.
- ▸Secondary: 2-3 metrics you're monitoring but not optimizing.
- ▸Guardrails: metrics that would stop the test (e.g., support tickets up >20%).
Sample size and analysis plan.
Pre-commit to the sample size before the test runs. Otherwise 'we'll stop when we hit significance' becomes 'we stopped when we saw the number we wanted'. The statistical framework you use (Frequentist vs Bayesian) doesn't matter much — consistency matters.
- ▸Target sample size: derived from MDE, baseline CR, and desired power.
- ▸Statistical framework: Frequentist (α=0.05) or Bayesian (posterior >95%). Pick one, stick with it.
- ▸Stopping rule: explicit (e.g., 'stop at 14 days OR target sample size, whichever first').
Who does what.
Every test should have a single owner accountable for shipping, analyzing, and communicating the result. Committees don't own tests — people do. Name the person. Include the shipping engineer, the analyst, and the comms owner explicitly.
- ▸Owner: one person, accountable for end-to-end.
- ▸Shipping engineer: who deploys the variant.
- ▸Analyst: who reviews stats at end-of-test.
- ▸Comms: who writes the post-test summary and shares it.
The highest-signal test briefs we've seen fit on one printed page. The lowest-signal ones sprawl across five Notion pages and include a 'background' section nobody reads.
Get Notion + Google Doc versions.
Google Doc, Notion, and PDF versions. Shared with you in one email.
Download →Related for your role
ALL RESOURCES →Stack Consolidation ROI Calculator
Enter what you pay Optimizely, Crayon, Hotjar, and Ahrefs today. See what Optimize Pilot would cost instead — and how many headcount the delta covers.
The Testing Velocity Playbook
How top CRO teams ship 6–8 experiments per quarter without sacrificing statistical rigor. Includes the idea-to-ship workflow we see in the top 10%.
Statistical Significance, Actually Explained
P-values without the jargon. When to use Frequentist vs Bayesian. When to stop a test early without lying to yourself. A practical primer for CRO teams.
Or let Navigator AI draft them.
Navigator AI produces pre-filled experiment briefs from your top-ranked hypotheses. Review, edit, ship — no blank page.