🥳 You have completed all topics in the Handbook. Click here to claim your certificate!

2. Tools of the CRO trade

The implementation of an experiment is often a tricky technical process. From calculations to variation assignment, and from choosing content versions to collecting the data, CRO requires an extraordinary amount of cooperation between different components of the average digital technology stack.

Conversion Rate Optimization is situated precariously between many different parts of a digital organization.

It’s a marketing discipline because most of the time its purpose is to generate business growth through optimizing marketing efforts.

It’s a software development discipline because it adds additional processing requirements on the client-side code.

It’s an analytics discipline because it relies on robust data collection to validate the experiment results.

It’s a user experience design discipline because it frequently experiments with removing bottlenecks from the site or app that hinder the visitor’s path to conversion.

On top of these, CRO relies on certain technologies that can be controversial in their own right.

Example

Visitors are randomly distributed to different groups based on which variant they should see and interact with. Information about this needs to be persisted so that the visitors stay in their groups throughout the experiment. To enable this, browser cookies or other persistent storage mechanisms are frequently used.

Similarly, A/B tests are often run with client-side technologies. This means that it’s possible that the actual variant isn’t applied until the user loads the web page or app. This, in turn, results in the risk of the so-called flicker effect, where the original (control) version of the experiment is briefly shown to the user before the client-side code kicks in and replaces it with the variation.

This sensitive balance between running the experiment without compromising general usability of the site has resulted in a proliferation of different tools and technical solutions for running experiments. In this Topic, we’ll take a look at some of these and discuss their implications from a technical marketer’s perspective.

Variant assignment and persistence

One of the first technical problems you need to solve with an experimentation tool is how to assign visitors to variants and how to persist this information.

Example

You want to run a simple A/B test with just the control and one variant to experiment with. You’ve decided on a 50/50 split of traffic without any further targeting rules.

This sounds simple, right? For every visitor to the page, just use a simple JavaScript method to pick one of two options at random, assign the visitor to a group based on this calculation, and show them the correct variant for the duration of the experiment.

Well, generating a random number is simple enough (although if you want to go down the rabbit-hole you can research how difficult true randomness is in a computational context). But assigning it to a visitor consistently is more difficult.

Firstly, to persist something on the web you need browser storage. This was discussed in an earlier Topic. Typically, browser cookies would be used for this, but other browser storage can also be utilized.

So once you have determined that the visitor should be either in group A or group B, you need to store this information in their browser so that for the duration of the experiment they will always be in group A or group B. Again, sounds simple, right?

But what if the visitor clears their browser storage? What if they browse with incognito or private browsing mode, which deletes all storage after each session? What if they visit again with a different browser or device?

This is a risk of running experiments in the user’s device. There’s no way for you to adequately control visitor behavior. Depending on your market segment, it’s possible that users clear their cookies more frequently than on other sites, which will directly increase the risk of a single visitor being exposed to different variants of the same experiment.

Seeing different variants of the same experiment can introduce mistrust and confusion, which will lead to an even more of an uphill struggle in the journey to conversion. It can also dilute the measured effect size for the experiment, as this partial inability to recognize returning users increases the likelihood of false negatives in the experiment results.

Example

Let’s say you’ve run an experiment where the control had a 5% conversion rate and the variation had a 6% conversion rate. In this case, the effect size was 6 / 5 = +20%. Users who saw both the control and the variation (because of lack of persistence of their group assignment) would have a conversion rate somewhere between 5 and 6, for example 5.5%. Let’s assume the percentage of users who were exposed to both versions was 30%.

When taking into account these variables, we can calculate that the conversion rate for the control was 30% * 5.5% + 70% * 5% = 5.15% and for the variation 30% * 5.5% + 70% * 6% = 5.85%. Thus the measured effect size was 5.85 / 5.15 = +13.6% instead of the real effect size of +20%.

Factoring in uncertainty of group assignment is part of running client-side experiments. Calibrating the measurement to be better aware of the size of uncertainty, such as by segmenting the data by browser, goes a long way into figuring out how many false negatives were recorded.

Ultimately, it might be useful to move away from the fragility of browser-based persistence and instead target the most important experiments with more deterministic group data, such as logged-in users.

Deep Dive

Improve the persistence of group distribution

The fickle nature of browser storage might tempt your organization to look at other solutions for persisting the visitor’s group assignment. After all, the better you can ensure each visitor consistently interacts with the same experiment variant, the more you can trust that the results are valid.

There are some solutions to this.

For example, if you’re OK with only showing experiments to logged in visitors, you can store their experiment metadata directly in the user data table in your database. Naturally, this would exclude all visitors who have not signed up to your service from the experiments.

Another option is fingerprinting. This is often floated around as a solution to the fragility of browser storage. The group information could be stored behind a browser fingerprint, which would improve the chances of the visitor always being assigned to the same group as long as they always use the same browser. So it doesn’t really solve the bigger picture, and it introduces severe privacy risks in return.

Don’t miss this fact!

While persisting group information with 100% reliability is practically impossible to do, it’s good to understand the inherently unreliable nature of using browser storage for persistence. When you have enough sample size to run experiments but have a hard time ending up with significant results, it could be wise to investigate if your A/B test solution has a problem with recognizing returning users in your experiment.

Running the experiment client-side

Designing the test typically happens with a tool or service that lets you dynamically select the elements on a web page that you want to experiment with.

Note that many drag-and-drop solutions use poorly designed JavaScript to create the code to be executed. It’s a good idea to loop in developers when working with test design, so that you can customize the code to be more performant and reliable.

Example

If you want to test two different CTA elements, the tool would let you select the element on the page, apply the changes to it, and then the tool can make use of this information when its scripts are loaded in the visitor’s browser.

When the experimentation tool is then loaded in the visitor’s browser and when the visitor’s group has been determined (either through random assignment or by loading the information from browser storage), the changes stored in the experiment are applied to the page.

In the case of a classic A/B test, for example, individual elements on the page might change to reflect the variant the user should be exposed to. Following the example above, instead of seeing the text Buy now! (the control), they’ll see Subscribe now!.

With multivariate tests, there can be many different elements that change on the page.

There are also redirect tests where instead of changing individual elements, client-side code automatically redirects the user to a different URL if their group assignment requires it. This can be a useful remedy for the flicker problem, as long as the redirection is done server-side. If the redirection is done client-side with JavaScript, it will be impacted by the flicker effect too.

As the visitor sees the element being treated by the experiment, they might change their behavior on the site as a result.

In experiments, data is being constantly collected from the visitor to see if they are converting against the objectives set for the experiment.

For example, if the website is testing different checkout steps, then it’s important to collect data about how the visitor moves through the ecommerce funnel.

Deep Dive

The flicker effect

Flicker, or “flash of original content” (FOOC), refers to the phenomenon where the experimentation tool applies the change to the page element (based on the user’s assignment to a variant) with a small delay.

There’s thus a very short period of time when the visitor sees the original content before the variant is applied.

This can be confusing to the user, especially if the site is testing different sales copy versions and the visitor sees a different pitch before the variant is applied.

There are solutions to the flicker effect, which include:

  1. Only change elements below the fold – that way the visitor needs to scroll down to see the element, which gives the experiment script more time to update the element. Note! This lowers the Power of the experiment, because you’ll only be able to impact the behavior of users who scrolled down far enough.
  2. Hide the original element initially – regardless of the visitor’s group assignment, the element is hidden until the script is ready to show either the control or the variant. This naturally leads to the page having an empty spot where the element should be, but sometimes this is better than a flicker of the wrong element.
  3. Hide the entire page initially – some tools take the drastic stance of hiding the entire page until the element is ready to be rendered. The good thing about this is that it leads to less confusion because it just shows up as a slightly slower page load. The bad thing is that you are delaying the page load in favor of an A/B test, which might be difficult to justify in your organization.
  4. Serve the updated content from the server – this is related to server-side experiments (see below). Instead of dynamically updating the element in client-side code, the web server serves the HTML with the correct element in place. This requires much more work than purely client-side solutions.

The flicker effect might not always be such a huge problem. Visitors are used to seeing elements loading lazily on the web, and they might not even notice the flicker most of the time.

Ready for a quick break?

Your eyelids are already flickering – it’s time to take a break.

Data collection

When collecting data about an experiment, it’s of course important to always include the visitor’s group assignment in the collected data.

Note that most A/B test tools add the user to an experiment group when they first load an experiment page. This means that when you analyze results, the experiment might include data from users who might never have seen or even interacted with the element that was changed. It’s a good idea to “tag” users who were exposed to the changed element for an appropriate amount of time to make it easier to focus your analysis on this cohort.

Once the analytics tool collects experiment data from visitors, subsequent analysis can include information such as:

  1. How many visitors have been added to each group. This is useful if you want to verify that traffic allocation to different variants works as intended.
  2. How many visitors are converting against the objectives set for the experiment.
  3. What else these visitors are doing on the site.

The first two points are instrumental in validating an experiment. Once the experiment is over, this data is used to calculate the statistical significance of the experiment result and whether the variant(s) or the control “won”.

If a group “wins” the experiment, it means that you have measured an uplift in the conversion metric (compared to the other variant) that is most likely not the result of random chance.

The winning variant as shown by https://abtestguide.com/calc/

While it’s important to stick to the conversion goals you configured when starting the experiment, and while it’s equally important to let the experiment run its full duration, analyzing secondary effects of the experiment can be very fruitful for generating inputs for new tests (with more evidence, of course).

Analytics tools allow you to segment the data based on the visitor’s assignment to different experiments and groups. Sometimes you might see patterns that you didn’t originally envision as the outcome of the experiment.

Example

Your checkout steps test didn’t result in a statistically significant “winner” with regard to the conversion goal you picked. However, you can still use your analytics data to see that perhaps visitors in a certain group were more likely to struggle with checkout errors than users in another group. This could be valuable input for a new test on how checkout errors are exposed, for example.

Don’t miss this fact!

Remember that even though your experimentation tool relies on a shortlist of conversion goals to validate the experiment, you have the full capacity of your analytics tool to dig deeper into user behavior while the experiment was running. Look for patterns in user behavior when segmenting by experiment group. Use this data to generate additional hypotheses for future experiments!

Server-side experiments

Running experimentation logic in the web server rather than in the visitor’s browser is an interesting prospect. In fact, it seems to solve many of the problems with client-side experimentation.

With server-side experiments, the visitor’s group assignment and the content delivery is done in the web server itself. In other words, when you navigate to a website that’s running an A/B test, as soon as you navigate to the site and see the page in the browser, you have already been assigned to a variant and the page that’s loaded from the web server has all the variant elements in place.

Another option is feature flagging, where the content served from the web server has both the control and the variation. The actual toggle to determine which content should be shown to the user is determined client-side.

This can be a huge asset, because it means that the visitor’s browser doesn’t have to load heavy and disruptive JavaScript libraries just for determining the visitor’s group and updating the elements on the page.

It doesn’t solve the problem of persisting the visitor’s group assignment, but it does counter other issues with client-side testing such as the flicker effect.

However, the beauty of client-side experimentation is that it’s all handled by a JavaScript library. All you need is for the visitor’s browser to load that library, and it takes care of the rest.

With server-side experiments, you need the web server to run all this additional logic. There are so many different web server software stacks that it’s not as simple as the plug-and-play JavaScript approach.

You also need the server-side process to communicate with the site’s content management system (CMS), so that it knows to deliver the correct variation.

Often when server-side experimentation is discussed, the opportunity to build a custom solution can be tempting. After all, running simple A/B tests doesn’t really require that much specialized technology.

Group assignment can be done with a randomization algorithm and cookies, variation selection can be done with the CMS and some metadata, and data collection can be done with whatever analytics tool the organization is already using.

Calculating things like experiment winners and statistical significance can be done using online calculators that are readily available.

Nevertheless, server-side experimentation often requires more maturity from the organization that wants to try it. Modifying server-side processes to benefit experimentation might be an even more difficult pill to swallow for developers than the overhead of running experiments in the visitors’ devices.

As a technical marketer, you are uniquely positioned to consult your organization about opportunities like these. It’s important to understand the limitations of running experiments for visitors, both client-side and server-side, before deciding which approach to follow.

You can experiment with experimentation, too! Try different approaches, tools, and services before figuring out which works best in your unique organization.

Key takeaway #1: User’s device stores information about their test group

If a user is included in an experiment, then it’s important that they remain in the test group for the duration of the test. If they saw a different version of the experiment content with repeat visits, it would dilute any data collected from the user as noisy. A commonly used technology to retain the user in the assigned group is browser storage – the user’s test participation is stored in a cookie or other browser storage which persists for the duration of the test.

Key takeaway #2: Flicker can ruin a test

When running tests in the user’s browser or device, the scripts are often loaded asynchronously. This means that the page has time to render for a while before the user’s group assignment is determined. If the user is part of a variation group, it might result in them seeing the original content quickly before it’s changed to the variant. This is called the “flicker” effect and can ruin the test due to producing conflicting visual cues.

Key takeaway #3: Don’t collect data just for the experiment

By collecting information about experiments running on the page and the user’s group assignments into the analytics tool, you can segment your visitors based on test participation. While most of the time you’re probably focused on the conversion rate, some tests might impact the user’s behavior on the site in other, unexpected ways. These insights can feed into new hypotheses for additional tests down the line.

Quiz: Tools Of The CRO Trade

Ready to test what you've learned? Dive into the quiz below!

1. What is the flicker effect?

2. Why is it important to collect experiment information into your main analytics tool?

3. What are the benefits of server-side experiments? Select all that apply.

Your score is

0%

What did you think about this topic?

Thanks for your feedback!

Unlock Premium Content

Simmer specializes in self-paced online courses for technical marketers. Take a look at our offering and enroll in one or more of our courses!