Experiment Setup & Monitoring

How to set up and monitor experiments — targeting, traffic allocation, managing multiple concurrent experiments, and knowing when to call a result. For technical implementation details, see the Developer Guide.

Page targeting

Each experiment is designed for a specific page or page type. The target URL shown in your Pleras dashboard is the page the experiment was built and tested against.

For experiments targeting a single page (like your homepage or a specific landing page), use that exact URL in your A/B platform. For experiments targeting a page type — such as a product detail page or collection page — the dashboard shows one example URL, but the experiment may apply across all pages of that type. In these cases, you'll need to configure a broader URL pattern in your platform (e.g. /products/* rather than the specific product URL). The experiment description will make it clear whether it targets a specific page or a page type.

Configure your platform to run each experiment only on the intended page or URL pattern. Getting this wrong is the most common source of broken experiments. When setting up URL matching, account for variations like with and without www, trailing slashes, query parameters, and locale prefixes (e.g. /en-gb/products/...).

Audience targeting

Beyond URL targeting, consider who should see the experiment:

  • Device type — some experiments target specific device layouts. Your Pleras dashboard includes screenshots across desktop, tablet, and mobile so you can see which devices an experiment works on. Use this to decide whether to target all devices or limit to specific ones in your A/B platform.
  • New vs returning visitors — relevant for experiments testing first-impression elements like hero sections or welcome messaging
  • Traffic source — if an experiment modifies a landing page, consider whether it applies to all traffic sources or specific campaigns

Traffic allocation

Run experiments at a 50/50 split between control and variant. An even split gives you the fastest path to statistical significance and the cleanest data.

Do not change the traffic allocation during a running experiment. Adjusting ratios mid-test (for example, starting at 10/90 and moving to 50/50) changes the composition of each group over time, which invalidates the statistical comparison between them. If you need to change the split for any reason, stop the experiment and restart it with a clean cohort at the new allocation. Data from the previous run should not be carried over.

All QA and validation should happen before the experiment goes live (see Quality assurance before launch), not by running at low traffic.

Running multiple experiments

You should run as many concurrent experiments as your traffic can support — the more you run in parallel across different areas of your site, the faster you learn. But how you structure them determines whether your results are reliable. There are three things to manage: technical conflicts, interaction effects, and statistical power.

Technical conflicts

Two experiments that modify the same DOM region on the same page will interfere with each other. If you need to run experiments targeting the same area, use your platform's mutual exclusion groups so each user only sees one, or run them sequentially. Experiments targeting different areas of the same page, or different pages entirely, don't conflict technically and can run concurrently.

Interaction effects

An interaction effect occurs when the impact of one experiment changes because another experiment is also running. The combined effect of two experiments is not always the sum of their individual effects. Understanding how interactions work — and how to manage them — lets you run concurrent experiments confidently rather than avoiding them.

How they manifest

Interaction effects take three forms:

  • Synergistic — the experiments amplify each other. A trust badge on the product page lifts conversion by 3%. Free shipping messaging on the cart page lifts it by 2%. Together, the lift is 8%, not 5%, because the trust badge makes users more willing to add to cart, and the free shipping removes the final objection at checkout. Each experiment makes the other more effective.
  • Antagonistic — the experiments diminish each other's impact. A scarcity message ("Only 3 left") on the product page lifts conversion by 4%. An urgency timer ("Sale ends in 2 hours") on the same page lifts it by 3%. Together, the lift is only 4%, not 7%. Both rely on the same mechanism — fear of missing out — and the second signal adds little because the user already feels the urgency. This is diminishing returns on the same persuasion technique.
  • Interference — the experiments actively work against each other. One experiment simplifies the checkout to reduce friction. Another adds a cross-sell module to the checkout to increase average order value. The simplification was effective because it removed distractions, and the cross-sell adds one back. Each experiment is individually sound, but they have opposing goals.

Why they happen

  • Shared psychological mechanisms — two experiments pulling the same persuasion lever (urgency and scarcity, social proof on the homepage and social proof on the product page) saturate the user's response to that mechanism. The second instance adds less than the first.
  • Changed audience composition — an experiment early in the funnel changes who reaches the next step. A homepage experiment that emphasises low price attracts price-sensitive users to the product page. A product page experiment optimised for premium positioning may not resonate with that audience. The experiments don't conflict technically, but the first one changes the population the second one is tested against.
  • Cognitive load — each individual change may be small, but collectively, multiple changes across a session make the experience feel unfamiliar. Users who are accustomed to the normal flow may become uncertain if too many things are different.
  • Competing objectives — one experiment simplifies, another adds information. One reduces steps, another introduces a step. These aren't just failing to add up — they're actively pulling in opposite directions.

How to detect them

  • Segment by co-exposure — look at an experiment's results split by whether users were also exposed to other running experiments. If experiment A shows a 5% lift for users who didn't see experiment B, but only 1% for users who also saw B, there's an interaction. Some enterprise experimentation platforms support this kind of segmentation — check whether yours does.
  • Watch for unexpected results — if an experiment that should clearly win based on the hypothesis and prior evidence comes back flat or negative, and other experiments were running concurrently in the same flow, interaction effects are a likely explanation.
  • Monitor guardrail metrics — if your primary metric looks fine but secondary metrics (bounce rate, pages per session, time on site) have moved unexpectedly, that can indicate interaction effects even when the primary metric doesn't show them.

How to mitigate them

  • Vary the persuasion techniques across concurrent experiments — if you're running a social proof experiment on the homepage, don't simultaneously run another social proof experiment on the product page. Spread different psychological approaches across the funnel.
  • Think about the end-to-end journey — consider what a user who sees all your concurrent experiments experiences in sequence. Does the combination create a coherent journey, or does it feel contradictory or overwhelming?
  • Run high-stakes experiments in isolation — for foundational changes (navigation overhaul, pricing restructure, checkout redesign), the risk of interaction effects contaminating your results is high enough that clean attribution matters. Run these on their own.
  • Accept the tradeoff for standard experiments — research from large-scale experimentation platforms has found that interaction effects, while real, are typically small relative to the main treatment effect. For most standard experiments, the velocity gain from running concurrently outweighs the measurement noise. Don't let the pursuit of perfect attribution slow down your experimentation programme.

Measuring what matters

Your primary metric should be the outcome that matters to your business — typically conversion or revenue. For experiments early in the funnel (homepage, landing pages), a step-level metric like click-through to the next page can be a reasonable primary metric because measuring final conversion from that distance would be noisy and slow to reach significance. But if you do this, always monitor downstream conversion as a guardrail. A homepage variant that increases clicks but doesn't improve — or actively hurts — downstream conversion is not a win.

Statistical power

Experiments on different pages don't split your traffic — every user can see all of them. Traffic is only divided when experiments are mutually exclusive, such as when multiple experiments target the same page and are placed in mutex groups. In that case, each experiment only receives a fraction of that page's visitors, extending the time to reach significance.

If your site gets 10,000 monthly visitors to a page and you run 5 mutually exclusive experiments on it, each one only sees 2,000 — which may not be enough to detect a meaningful effect in a reasonable timeframe. Factor this into how many same-page experiments you run concurrently.

Maximising velocity

Maximise both the number of concurrent experiments your traffic supports and the speed at which you cycle through them. The biggest drag on experimentation velocity is usually the gap between experiments, not the experiments themselves.

When an experiment wins, you have two options: implement the change into your codebase, or push the winning variant to 100% of traffic and move on to the next experiment. Both are valid. Implementing into the codebase is the long-term best practice — it's permanent, performant, and doesn't depend on your testing platform. But it takes development time, and waiting for implementation before launching the next experiment slows your programme down. Running winners at 100% keeps your velocity high while your team schedules the permanent implementation.

If you're building on previous results — for example, your next experiment targets a page where a previous winner is still running at 100% — let us know and we'll update the new experiment to work on top of it.

Sample size and duration

  • Decide upfront how long you'll run — use a sample size calculator before starting. You need your current baseline conversion rate, the minimum detectable effect you care about, and your daily traffic. This tells you how long the test needs to run.
  • Run for full weekly cycles — run for at least one full week (ideally two or more) to capture day-of-week variation, even if you hit your sample size sooner. If your business has monthly patterns (e.g. payday effects), account for those too.
  • Understand your platform's statistical method — some platforms use fixed-horizon frequentist testing, where checking results early inflates false positives. Others use sequential testing or Bayesian methods that are designed for continuous monitoring. Know which one your platform uses and follow its guidance on when to call a result. If you're not sure, err on the side of running longer.

Quality assurance before launch

Before activating an experiment for real traffic:

  1. Preview in your platform — most A/B testing tools have a preview or quality assurance mode. Use it if yours does.
  2. Test on multiple devices — check desktop, tablet, and mobile. Experiments include responsive CSS but your site's layout may interact unexpectedly.
  3. Test on multiple browsers — experiments use ES5-compatible JavaScript (var, no arrow functions, no template literals) for broad browser support. But CSS rendering can still vary.
  4. Check page speed — experiments add DOM elements and styles. The impact should be negligible, but verify that Largest Contentful Paint (LCP) and Cumulative Layout Shift (CLS) metrics aren't degraded using your browser's Performance tab or Lighthouse.
  5. Verify tracking fires — open the browser console, filter for dataLayer, and confirm experiment_exposure fires on load and experiment_interaction fires on relevant user actions.
  6. Check for flicker — if your platform injects code asynchronously, users may briefly see the original page before the experiment applies. Most platforms offer anti-flicker snippets. Use them.

Monitoring a live experiment

Once an experiment is running:

  • Check error rates — monitor your browser error tracking (Sentry, LogRocket, etc.) for any new JavaScript errors on experiment pages
  • Watch bounce rate — a sharp increase in bounce rate on the variant is a red flag, even before you have conversion data
  • Revenue guardrails — if you're testing on high-traffic commercial pages, keep an eye on revenue per visitor. If it drops meaningfully in the variant, stop the experiment. Some platforms allow you to set this up as an automatic rule, but most will require you to monitor it manually.

When to stop an experiment

  • Clear winner — the variant is statistically significant at your chosen confidence level (typically 95%) with a meaningful effect size
  • Clear loser — the variant is statistically significant in the negative direction — it's making things worse with the same level of confidence
  • No effect — you've run for the planned duration and the result is flat. This is still a valid learning.
  • Technical issues — console errors, broken layouts, tracking failures. Stop immediately, fix, and restart with a clean cohort.

Don't restart a stopped experiment and count it as the same test. If you fix something and re-launch, treat it as a new experiment with fresh data.