Types of Probability Distributions and how it connects to Data Science

Types of Probability Distributions and how it connects to Data Science#

(1A). Bernoulli distribution → one yes/no outcome#

A Bernoulli distribution models a single binary decision:
yes/no, success/failure, or 1/0.

A random variable $X$ follows a Bernoulli distribution if:

$P(X = 1) = p$
$P(X = 0) = 1 - p$

Mean (Expected Value):
$\mathbb{E}[X] = p$

Bernoulli distribution (Slideserve)

Bernoulli PMF/Outcome illustration (Medium)

import numpy as np

np.random.seed(0)
samples = np.random.binomial(n=1, p=0.3, size=20_000)
samples.mean()

(1B) Binomial Distribution → Many Bernoulli Trials#

A Binomial distribution models the number of successes across repeated, independent Bernoulli trials.

Each trial:

has two outcomes (success / failure)
uses the same success probability $p$

Mathematically

If $X \sim \text{Binomial}(n, p)$, then

$P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}$

Mean: $\mathbb{E}[X] = np$

Quick Intuition (T/F Quiz)#

A quiz has 10 True/False questions.
Each question is a Bernoulli trial.
The total number of correct answers follows: [ X \sim \text{Binomial}(10, p) ]

In Data Sciece, it helps answer questions like how many users will click on an ad or how many tests will pass out of a fixed number of trials.

np.random.binomial(n=100, p=0.08, size=10_000).mean()

### Visual Probability Mass Function(PMF) Plot
### A PMF tells you how likely each possible value of a discrete random variable is.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom

n, p = 10, 0.6
k = np.arange(0, n + 1)

pmf = binom.pmf(k, n, p)

plt.figure()
plt.stem(k, pmf)
plt.xlabel("Number of successes (k)")
plt.ylabel("P(X = k)")
plt.title("Binomial PMF (n = 10, p = 0.6)")
plt.show()

../_images/3c8d7c8fc2fb2b1bc7d3eb43be986c34f524a3835e2c4b5d5ef320fdfe63db89.png

(1C) Poisson Distribution → Event Counts Over Time or Space#

A Poisson distribution models how many times an event occurs in a fixed interval
(of time, space, area, etc.). Examples include:

number of emails received per hour
number of website requests per minute
number of errors in a system per day

Poisson = counting random events in a fixed interval at a constant rate.

Mathematically: Poisson Distribution

If $X \sim \text{Poisson}(\lambda)$, then

$P(X = k) = \dfrac{e^{-\lambda}\lambda^k}{k!}$

Mean (Expected Value):
$\mathbb{E}[X] = \lambda$

Poisson probability mass function illustrating how the distribution depends on the average rate $\lambda$. The x-axis shows the number of events $k$, and the height of each bar represents $P(X = k)$. When $\lambda$ is small, most probability mass is concentrated near $k = 0$ or $1$. As $\lambda$ increases, the distribution shifts to the right and becomes more spread out, reflecting higher and more variable event counts. Source: Wikimedia Commons.

What Does $\lambda$ Mean? $\lambda$ (lambda) is the average rate of events per interval.

Example: $\lambda = 3$ means on average 3 events per interval
In a Poisson distribution, the mean equals the variance

What Is $e$? $e \approx 2.718$ is a mathematical constant (Euler’s number). It naturally appears in models involving random arrivals and decay. You do not need to compute it manually, software handles it

Key Assumptions (Very Important)#

The Poisson model assumes:

Independent events
One event does not affect another
Constant average rate ($\lambda$)
The rate does not change over the interval
Events occur randomly
Not in clusters or bursts

If these assumptions fail, Poisson may not be appropriate.

It is useful for modeling arrivals, failures, errors, or requests when events happen independently at a roughly constant rate.

np.random.poisson(lam=3, size=10_000).mean()

(1D) Zero-Inflated Poisson → Excess Zeros in Count Data#

Some real-world count datasets contain many more zeros than a standard Poisson model can explain.

Examples:

many users make zero purchases
many customers file no insurance claims
many days have no system errors

A standard Poisson model assumes zeros occur naturally from random variation. When zeros appear too frequently, this assumption breaks.

Idea: Zero-Inflated Poisson (ZIP)

A Zero-Inflated Poisson model assumes two underlying processes:

(1) Inflation component (Bernoulli-like):
Determines whether an observation is a structural zero
(e.g., an inactive user with no chance of events)

(2) Poisson component:
Models the number of events when activity is possible

This separation distinguishes:

“cannot happen” zeros (structural zeros)
“could happen but didn’t” zeros (random zeros)

Why This Matters? If excess zeros are ignored:

Poisson underestimates zeros
model fit degrades
conclusions become misleading

When to Use Zero-Inflated Poisson#

Use a Zero-Inflated Poisson model when count data has far more zeros than a standard Poisson can explain.

###(2A). Uniform → equally likely values A Uniform distribution assumes all values in a range are equally likely.

Mathematically: Uniform Distribution

If $X \sim \text{Uniform}(a, b)$, then

$f(x) = \dfrac{1}{b-a}$ for $a \le x \le b$

Source.Geeksforgeeks.

It is often used as a baseline model, for random sampling, simulations, and sanity checks when no additional structure is assumed.

np.random.uniform(0, 1, size=10_000).mean()

(2B) Normal (Gaussian) Distribution → Noise, Error, and Aggregated Behavior#

The Normal distribution describes data that clusters around a central value, with fewer observations as you move farther away.
It is often called the bell-shaped curve.

Mathematically

$X \sim \mathcal{N}(\mu, \sigma^2)$
where:

Mean ($\mu$): the center of the distribution — the average or typical value

Standard deviation ($\sigma$): how spread out the values are — larger $\sigma$ means more variability

For a Normal distribution:

Mean = Median = Mode = $\mu$
(the average, middle, and most frequent value coincide)

Intuition#

The Normal distribution appears when:

many small, independent effects add together
we observe averages or measurement noise

This is why it commonly appears in:

sensor and measurement noise
model residuals (errors)
test scores and biological traits

Bell-shaped Normal distribution showing center and spread. Source: GeeksforGeeks.

The 68–95–99.7 Rule#

The Normal distribution has a predictable spread:

~68% of values lie within ±1σ of the mean
~95% lie within ±2σ
~99.7% lie within ±3σ

This rule helps quickly estimate where most data values fall.

Why the Bell Shape Appears#

Most values are close to the mean
Extreme values are rare
Data spreads out symmetrically on both sides

Averages are common, extremes are rare, and variability matters.

np.random.normal(loc=0, scale=1, size=10_000).mean()



##Why Distribution Awareness Comes First

Understanding how data is distributed helps determine:

- which statistical tests are valid
- which machine learning models are appropriate
- which evaluation metrics are meaningful

Using the wrong distributional assumptions can lead to confident but incorrect conclusions, even when the computations are correct.

In data science, modeling starts with understanding the data-generating process, and probability distributions are the language we use to describe it.



---
> **Data Science Connection**
>
> - Bernoulli and Binomial → classification outcomes, A/B testing
> - Poisson and Zero-Inflated Poisson → event counts, sparse data
> - Uniform → random baselines and simulations
> - Normal → noise, error, averages, and model residuals
>
> In practice, **distribution awareness comes before modeling**.
---