Knowledge · Reference

Common Pitfalls

Eight failure modes that quietly burn weeks of testing effort. Worth scanning before every launch and again before declaring a winner.

Peeking and early stopping

Calling a winner before the planned run length inflates your false-positive rate.

The sample-size math assumes one read at the end. Each interim check adds chances to randomly cross the significance line. Teams that peek and stop early routinely ship variants that look great on day five and revert to flat by month two.

Underpowered tests

A flat result on too-small a sample is not evidence of no effect.

If your test lacked the sample size to detect your MDE, a flat outcome means you don't know — not that the change is neutral. Re-running with a tighter MDE or longer duration is usually the right call before retiring the idea.

Engagement ≠ business value

A clickier element that doesn't convert is noise, not a win.

CTR went up but conversion didn't? That's not always a win — sometimes it's a more clickable element that pulls attention without changing purchase behaviour. Pair every mid-funnel metric with the downstream one that actually matters.

Audience overlap

Concurrent tests on the same traffic can contaminate each other.

If another activity is running on overlapping audiences at the same time, lifts can flow between them in either direction. Coordinate with the other activity owner, stagger your runs, or explicitly exclude their audience from yours.

Seasonality and external events

Black Friday, a press cycle, or a launch elsewhere will skew your numbers.

Anything that shifts traffic mix or intent during the run will show up in your metric. Check the calendar before launch, and if something big lands mid-test, document it — you may need to throw the result out or extend.

Forgotten guardrails

A winner that breaks something else isn't necessarily a winner.

A variant that lifts conversion but also bumps page-load time, error rate, or support tickets isn't a clean win. Decide what you wouldn't ship even for a lift — latency, accessibility regressions, downstream churn — and watch those metrics alongside the primary.

Heavy DOM manipulation

VEC activities on fragile markup break silently — and your control gets corrupted.

Pages with frequent layout changes, dynamic IDs, or heavy SPA rerenders will eventually invalidate your VEC selectors. The variant stops applying, or worse, applies inconsistently, and you don't notice until results look weird. Lean toward form-based composer or custom code on high-risk pages.

Skipping compliance review

Retrofitting compliance after launch creates rework — sometimes a lot of it.

Regulated industries — financial services, health, super, telco — often need legal or compliance sign-off on disclaimers, consent language, or contact-frequency rules before a variant goes live. Pulling it in at the design stage costs minutes; pulling it in after launch can cost a re-run.