Example Walkthrough#

Let’s return to the food delivery company.

Data:

  • Old system mean = 24.10

  • New system mean = 25.30

  • Sample size = 200 users

  • Sample standard deviation = 5

We run a t-test. Suppose we get: p-value = 0.02

Interpretation: If the new system had no effect, there is only a 2% chance of seeing this difference.

That is small. So we reject the null hypothesis.

Conclusion: Evidence suggests the new recommendation system increases order value.

What Hypothesis Testing Does NOT Tell You#

Very important. Rejecting the null does NOT prove something is true with certainty. It only means:

The data is unlikely under the null assumption.

Also:

  • Small p-value ≠ large impact

  • Statistical significance ≠ practical importance

A tiny improvement can still be statistically significant if sample size is huge.

Always check effect size too.

Two Common Mistakes (Errors)#

Because decisions are made under uncertainty, mistakes can happen.

Type I Error: False Alarm#

Rejecting a true null.

Example: We think the new system works… but it doesn’t.

Probability = α (e.g., 5%)

Type II Error: Missed Detection#

Failing to reject a false null. Example: The system really works… but we fail to detect it.

Visualizing Type I and Type II Errors#

Understanding statistical errors becomes much easier when we see them visually. The graph below shows two distributions:

  • The null hypothesis distribution

  • The alternative hypothesis distribution

Because these two distributions overlap, mistakes are possible.

  • Type I error (α) — rejecting a true null hypothesis (false positive)

  • Type II error (β) — failing to reject a false null hypothesis (false negative)

The shaded regions in the figure show where these errors occur.Everything else represents correct decisions.


Figure: Graphical representation of hypothesis testing errors.The blue curve shows the distribution when the null hypothesis (H₀) is true, and the green curve shows the distribution when the alternative hypothesis (H₁) is true. The vertical line indicates the critical value. The shaded area under H₀ beyond this boundary represents the Type I error (α), while the shaded area under H₁ on the non-rejection side represents the Type II error (β).

Source: Wingify (VWO).


Why This Matters in Data Science#

Hypothesis testing appears everywhere:

  • A/B testing products

  • Evaluating ML model improvements

  • Medical research

  • Marketing campaigns

  • Feature impact analysis

  • Policy evaluation

Any time we ask:

Did this change actually cause an effect?

We use hypothesis testing.

Final Business Interpretation: We answer three questions:

  1. Is the difference statistically real?

  2. How large is the difference?

  3. Is it worth implementing?

Data science is not just testing: it is decision making.

In the next section, we will implement a t-test in Python and run a real A/B experiment.