Test Reporting 2.0



Background
In Shoplift’s early days as an A/B testing tool for e-commerce brands, our focus was on expanding testing capabilities to stay competitive—answering user questions like “Can I test X?” and driving growth. However, as we scaled, we noticed retention gaps: while users loved launching tests, they struggled to grasp key insights from their results. This meant some of our core value propositions weren’t landing, limiting our differentiation from competitors.

Over ~8 months, through support tickets, user calls, and session recordings, we identified a major friction point: the Test Reporting screen. While setting up tests was seamless, interpreting results over time was not. Users needed a clearer, more actionable reporting experience to turn their tests into decisions—and that’s where we focused our efforts.

Release date: February 2025

Users
Trial users & new users

Team
CPO, Product manager, Statistician, Backend engineer, Fullstack engineer

Before & after - reporting page

🔎 Discovery

The reporting experience is where users discover the value of Shoplift, and it’s the core of the product. The reporting screen contains sensitive financial information users need to be able to base business decisions on every day. Our product goal was to keep it simple, clear, and actionable, but as the product grew, our product evolved, and so did our users, and we needed to expand its capabilities while still maintaining our core values.

I broke down the lifecycle of a test by time and analyzed the user sentiment for each period:

User journey analysis

📚 Learnings

Through user analysis during the test-running phase, we discovered that most friction occurred within the first 72 hours, especially for new users. Here’s what we learned:

1. Users Were Ending Tests Too Soon

  • Many users prematurely stopped tests due to uncertainty.
  • Some weren’t confident their test was set up correctly.
  • Small sample sizes led to exaggerated results—big green/red numbers and drastic lifts (positive or negative) made users anxious, prompting them to pull the plug early.
  • Conflicting data added to the confusion. We ran two models: Bayesian (rolling hourly updates) and Frequentist (updated daily). Some users saw a negative lift, yet the test was "trending positive," leading to doubt.

2. Trust in Data Was an Important Consideration

  • Some users noticed differences between our data and Shopify and GA4, leading to questions about how we calculate metrics.
  • Our metric definitions and calculation methods varied from Shopify/GA4, sometimes resulting in different conversion rate figures.
  • Strong bot filtering led to variations in visitor counts, which required further explanation.
  • As a newer product, Shoplift was still establishing its credibility, and helping users understand our approach was a key focus.

3. The Reporting Experience Was Overwhelming

  • The sheer volume of numbers, charts, and tables caused cognitive overload, especially for first-time users.
  • Some framing felt scary or negative (e.g., "Inconclusive Data" or "Insufficient Data"), leading users to second-guess their tests.
  • Managed accounts (with hands-on guidance) had a significantly smoother experience compared to self-serve users. Having someone to explain the data, set expectations, and guide best practices helped users focus on what mattered most.

4. Improving Reporting Meant More Than Just UI Changes—It Needed to Add Value

  • We were capturing valuable data but not surfacing it effectively.
  • We needed to future-proof the product—multivariate testing, DOM testing, and custom goals were on the roadmap and would require deeper reporting changes.
  • Users frequently requested additional features like goal progress tracking, time-based and segment filtering, and more granular breakdowns.
  • We had created high expectations with our advanced capabilities, yet we were missing some basic features that other A/B testing tools had built-in.

These insights guided our redesign efforts, ensuring that improvements were not just cosmetic but strategically addressed user concerns and friction points.


    Concepts, explorations and  early tests



    🎢 Feedback loop

    The insights we gathered shaped the scope of our focus, allowing me to define a clear design hypothesis and establish success metrics. With these foundations in place, I began working on wireframes and conceptual ideas, gathering iterative feedback from key stakeholders, including the engineering team, our statistician, the customer success (CS) team, and sales. Our discussions focused on several critical areas:

    1. Statistical Integrity & Accessibility
      Ensuring that our calculations were mathematically sound while presenting them in a way that was understandable and approachable for users. We aimed to balance scientific accuracy with clear, human-centered communication.

    2. Technical Feasibility
      Evaluating the development effort required, prioritizing efficiency by leveraging existing data rather than introducing entirely new backend structures. The goal was to minimize complexity while maximizing the value of surfaced insights.

    3. Addressing Customer Confusion
      Ensuring the design directly tackled the key pain points users faced in interpreting their test results. We worked closely with the CS team to understand recurring challenges and ensure the new design reduced reliance on support for self-service users.

    4. Business & Growth Impact
      Gathering input from leadership, including the CEO and sales team, to align the project with broader business objectives. This included assessing how the redesign would influence user activation, growth, and investor-relevant metrics like churn reduction.
    .


    Usability testing


    🕹️ Usability testing

    As we prepared for significant changes, it was crucial to ensure that existing users felt supported and not disrupted. To achieve this, we tested a prototype with them, including some out-of-scope elements we were curious to explore.

    Testing Goals:

    • Evaluate the new navigation: Determine if users could easily find what they were looking for.
    • Assess the perceived value of the new additions to the reporting.
    • Gain deeper insights into how users analyze their test results.

    Key Takeaways:

    • Positive to neutral reception: While some users appreciated the new look, others were indifferent and immediately focused on exploring the data.
    • Navigation clarity: Users had no difficulty finding relevant information. When asked to locate specific segment data, they correctly identified the corresponding tab on the first attempt.
    • Data transparency preference: Experienced users preferred seeing raw data upfront, rather than having it pre-interpreted. They valued having our analysis as a secondary layer.
    • Visual variant views: Users actively engaged with image-based variant displays in the previous design. This feature was a key part of their workflow and should be preserved in the new iteration.


    📊 Final Designs

    After incorporating the final refinements based on usability testing and adjusting for planned scope reductions, we arrived at the final experience for developer handoff. During this phase, we addressed a few remaining edge cases, including:
    • Ensuring compatibility with tests launched prior to the update.
    • Validating that updates to the statistical model did not retroactively alter results.
    • Accounting for all potential test states and scenarios to ensure a seamless experience.


    🔮 Next Steps

    Following an impact analysis and user feedback, we prioritized the remaining enhancements for future development:
    • Metrics/Goal Tab: Introduce an advanced view that allows users to explore all test metrics simultaneously, with day-by-day breakdowns.
    • Segment Performance Charts: Users heavily rely on daily test performance tracking; we aim to extend this view to include segment-level insights over time.
    • Expanded Test Timeline View: Enhance the timeline to display test progress, key milestones, and changes such as trends shifts and statistical significance achievements.
    • Upcoming Feature Integration: The introduction of multivariate testing and DOM testing will require dedicated representation within the reporting framework.
    • Comprehensive Overview Charts: Initially scrapped by the product team, but resurfaced as a critical need from the CS team. This feature will help users and support teams cross-validate test data across multiple analytics platforms (e.g., Shopify and GA4).
    • Expanded Reporting Capabilities: Extend reporting to include additional dimensions such as landing pages, product performance, geo-based analysis, and custom segment insights.

    Thank You! ✨️

    Mark