Example Walkthrough

Example Walkthrough#

Let’s return to the food delivery company.

Data:

Old system mean = 24.10
New system mean = 25.30
Sample size = 200 users
Sample standard deviation = 5

We run a t-test. Suppose we get: p-value = 0.02

Interpretation: If the new system had no effect, there is only a 2% chance of seeing this difference.

That is small. So we reject the null hypothesis.

Conclusion: Evidence suggests the new recommendation system increases order value.

What Hypothesis Testing Does NOT Tell You#

Very important. Rejecting the null does NOT prove something is true with certainty. It only means:

The data is unlikely under the null assumption.

Also:

Small p-value ≠ large impact
Statistical significance ≠ practical importance

A tiny improvement can still be statistically significant if sample size is huge.

Always check effect size too.

Two Common Mistakes (Errors)#

Because decisions are made under uncertainty, mistakes can happen.

Type I Error: False Alarm#

Rejecting a true null.

Example: We think the new system works… but it doesn’t.

Probability = α (e.g., 5%)

Type II Error: Missed Detection#

Failing to reject a false null. Example: The system really works… but we fail to detect it.

Visualizing Type I and Type II Errors#

Understanding statistical errors becomes much easier when we see them visually. The graph below shows two distributions:

The null hypothesis distribution
The alternative hypothesis distribution

Because these two distributions overlap, mistakes are possible.

Type I error (α) — rejecting a true null hypothesis (false positive)
Type II error (β) — failing to reject a false null hypothesis (false negative)

The shaded regions in the figure show where these errors occur.Everything else represents correct decisions.

Figure: Graphical representation of hypothesis testing errors.The blue curve shows the distribution when the null hypothesis (H₀) is true, and the green curve shows the distribution when the alternative hypothesis (H₁) is true. The vertical line indicates the critical value. The shaded area under H₀ beyond this boundary represents the Type I error (α), while the shaded area under H₁ on the non-rejection side represents the Type II error (β).

Source: Wingify (VWO).

Why This Matters in Data Science#

Hypothesis testing appears everywhere:

A/B testing products
Evaluating ML model improvements
Medical research
Marketing campaigns
Feature impact analysis
Policy evaluation

Any time we ask:

Did this change actually cause an effect?

We use hypothesis testing.

Final Business Interpretation: We answer three questions:

Is the difference statistically real?
How large is the difference?
Is it worth implementing?

Data science is not just testing: it is decision making.

In the next section, we will implement a t-test in Python and run a real A/B experiment.