Summary of the Chapter#
In this chapter, we introduced probability as a way to reason about uncertainty and showed how probability distributions describe how data is generated in real-world settings.
We began with the laws of probability, which define how probabilities behave and ensure consistency:
probabilities are always non-negative
the total probability of all possible outcomes is 1
probabilities of mutually exclusive events add together
We then introduced expected value, which represents the long-run average outcome of a random process and is a key concept for reasoning about outcomes in data science.
We distinguished between discrete and continuous distributions and covered the most common ones used in data science:
Bernoulli: models a single yes/no outcome (foundation of binary classification)
Binomial: models repeated Bernoulli trials (used in A/B testing and conversions)
Poisson: models event counts over time or space (arrivals, errors, failures)
Zero-Inflated Poisson: handles count data with many zeros (sparse user activity)
Uniform: assumes all values in a range are equally likely (simulations, baselines)
Normal (Gaussian): models noise, error, and averages
We also introduced the Central Limit Theorem, which explains why averages often follow a Normal distribution, even when the original data does not. This idea supports many statistical tools used in data science.
A key takeaway is that distribution awareness comes before modeling.
The choice of distribution affects:
which assumptions are valid
which models are appropriate
which evaluation metrics make sense
In data science, effective modeling begins with understanding how data is generated.
Probability distributions provide the language to describe that process clearly and correctly.