AB Test Notes
AB Test Notes
In an A/B test the experimenter sets up two experiences: “A,” the control, is usually the current
system and considered the “champion,” and “B,” the treatment, is a modification that attempts
to improve something—the “challenger.”
Sensitivity (true positive rate): the ability of a test to correctly identify patients with a disease.
Specificity (true negative rate): the ability of a test to correctly identify people without the
disease.
Beta: false negative
statistical power (true positive): the power of a hypothesis test is the probability that the test
correctly rejects the null hypothesis
Bonferroni correction:
Significance level/number of tests
Significance level 10 test = 0.05/10 = 0.005
This method is too conservative
They will not last long. A/B test has a larger or smaller initial effect due to novelty or primacy.
If a test is very successful initially and after a week the treatment effect quickly declined, then it
is due to the novelty effect.
Network effect
- User behaviors are impacted by others
- The effect can spillover the control group
- The difference underestimates the treatment effect
E.g., We are testing a new feature to increase posts created per user. Users are assigned
randomly. The test won by 1% in terms of the number of posts. What will happen when the
new feature is launched to all users? Will it be the same as 1%?
Ans: The difference will be more than 1%. Suppose people in treatment group post more
often. Their friends, who are in control group, may also want to post more after seeing
more posts. SO the detected effect between ctrl and exp is actually smaller than what it
should be.
(From trustworthy)
E.g., LinkedIn launch a better “People You May Know” RS for treatment group. If the
primary metric is #invitations sent, both group invitations are likely to increase. So the
treatment effect(delta) is biased downward.
Network mitigation
- Create network clusters
o People interact mostly within the cluster
o Assign clusters randomly to treatment and control group
Two-sided markets
- Resources are shared among control and treatment groups
E.g., Can coupon make people use Uber more?
Ans: The treatment group attract more drivers, fewer drivers are available for the control
Group. The difference becomes larger than it should be. Actual effect < treatment effect.
( )
2 2 2
iii. rule of thumb 2 σ z α + z β /δ
2
Concept Questions
1. What assumptions are made for t-test/hypothesis test?
The common assumptions made when doing a t-test include those regarding the scale of
measurement, random sampling, normality of data distribution, adequacy of sample size, and
equality of variance in standard deviation.
A null hypothesis is a type of hypothesis used in statistics that proposes that there is no
difference between certain characteristics of a population
In statistical hypothesis testing, a result has statistical significance when it is very unlikely to
have occurred given the null hypothesis.
Statistical significance refers to the claim that a result from data generated by testing or
experimentation is not likely to occur randomly or by chance but is instead likely to be
attributable to a specific cause.
Confounders