There are several ways that test data can be run. In an ideal world, a controlled A/B test is the most statistically robust; however, for this client, this was not appropriate due to the campaign structure and the nature of the test.
As a result, alternative testing techniques & their limitations were summarised as follows:
Pre vs. Post - Comparing the figures before and after a change in the treatment group
Assumes there are no seasonal/market considerations i.e. no control reference.
Difference in Differences (DiD) - Quasi-experimental approach that compares the changes in outcomes over time.
Relies on a control group that is assumed to show a perfect parallel trend.
Generally done on a point estimate basis i.e. generates a single value difference based on pre vs. post.
Synthetic Control and/or Causal Impact* - Use the help of control groups to construct a counterfactual of the treated group giving us an idea of what the trend is if the treatment had not happened.
These methods are highly susceptible to change based on the input data provided. Similar to the DiD they also required a good control reference to ensure the output is an accurate representation of the data
*The difference between Synthetic Control and Casual Impact is that Synthetic Control uses only pre-treatment variables for matching while Causal Impact uses the full pre and post-treatment time series of predictor matching.
Did the remaining keywords pick up volume?
As we can see in Figure 1 below, upon pausing a large set of keywords on October 24th, the volume of our pre-selected keyword rose significantly:
If we then look at the specific search term and how the keywords triggering for the term change, we can see the following:
Pre/Post, Difference in Differences & Synthetic Control
With all pre-post test approaches that leverage a control group, isolating the most representative ad group(s) to act as the control is key.
The chart below shows the correlation between the selected control vs the experiment group pretest launch for conversions**.
**As mentioned previously, with control based experiment analyses - the limiting factor in complete confidence in the result is finding a control that most appropriately maps to the experiment across all primary measures. For this reason, tests of this nature are non-idealised.
Comparing Pre vs Post, the simplistic approach
The most simplistic method of analysis is comparing the average number of daily clicks & conversions Pre & Post test. Here standard deviation is simply being used to illustrate variance in the data:
If we run a simple T-test on the pre vs post data we can see there is no significant difference in click volume - this is also true of conversion volume.
Although this method is a good simple approach to analysing performance - it doesn't consider other contributors to changes in performance such as:
➡️ Seasonal differences
➡️ Changes in demand
Point view using Difference in Differences
A point view of Difference in Differences (DiD) takes into consideration what occurred in the control group to determine the drivers of the impact observed in the test group. Visually this looks as follows, where ꞵ3 represents the intervention effect of the test.
Applying this to our model we get an output that suggests we saw an increase of 8.6 conversions per day as the DiD methodology where our Control Ad Group was the control. If we take into consideration the standard deviation of the data in the post period, this lies within 1 Standard Deviation of the mean, suggesting we can't conclusively say this change improved performance.
Additionally, if we run the same analysis with all relevant Ad Groups as the control the increase is suggested to be much less significant: ~3.5 additional conversions per day. This further shows the fragility of this methodology.
Synthetic Control approach time-series view
The final view to consider is the Synthetic Control. It involves the construction of a weighted combination of groups used as controls, to which the treatment group is compared.
This comparison is used to estimate what would have happened to the treatment group if it had not received the treatment. As a technique, it is very similar to Causal Impact in estimating the true impact of a treatment. The chart below shows this method applied to the conversion volume over time.