BONUS: Python Visulization

BONUS: Python Visulization#

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm, t
from scipy import stats
#Interactive Tail Visualization

z = 2.0
x = np.linspace(-4,4,500)
y = norm.pdf(x)

plt.figure(figsize=(8,5))
plt.plot(x,y)

# right tail
plt.fill_between(x, y, where=(x>=z), alpha=0.4, label="Right tail")

# left tail
plt.fill_between(x, y, where=(x<=-z), alpha=0.4, label="Left tail")

plt.title("Two-Tailed Test")
plt.legend()
plt.show()


z = 2.0
x = np.linspace(-4,4,500)
y = norm.pdf(x)

plt.figure(figsize=(8,5))
plt.plot(x,y)
plt.fill_between(x, y, where=(x>=z), alpha=0.4)

plt.title("One-Tailed Test (Right)")
plt.show()
../_images/b777c9a78c24863fdaf9c06f9f19eed1773bf72901654988ee882d73261bfdb3.png ../_images/ba45578f67318705c05ed886eefa8489019290003dac74dcfcf2e63992101ee7.png

Notice:

  • Two-tailed → split probability across both ends

  • One-tailed → all probability in one direction

One-tailed tests are more powerful but must be justified BEFORE analysis.

'''
The p-value is the shaded area beyond the test statistic.

It represents how extreme the observation is.
'''

z_score = 2.1

x = np.linspace(-4,4,500)
y = norm.pdf(x)

plt.figure(figsize=(8,5))
plt.plot(x,y)
plt.fill_between(x, y, where=(x>=z_score), alpha=0.4)

plt.axvline(z_score, linestyle="--")
plt.title("p-value Area (Right Tail)")
plt.show()
../_images/31945d00af11045c47612225db34dc90d65261dd64bcfe808a67d46eace9a565.png
## Understanding Sampling Variation

'''
Why Do We Need Hypothesis Testing?

Even if both systems were identical, sample means would still differ.Let’s simulate that situation.

Assume: Both groups truly have the SAME average. We repeatedly sample users and measure the difference.

This shows how random variation alone can create differences.
'''

np.random.seed(0)

differences = []

for _ in range(5000):
    group1 = np.random.normal(24, 5, 100)
    group2 = np.random.normal(24, 5, 100)
    differences.append(group2.mean() - group1.mean())

'''
Sampling Distribution of Mean Differences:

This shows the range of differences expected purely by chance.
'''

plt.figure(figsize=(8,5))
plt.hist(differences, bins=40)
plt.xlabel("Difference in Means")
plt.ylabel("Frequency")
plt.title("Sampling Distribution (No Real Effect)")
plt.show()
../_images/533d5e99fbbafaac22d13a3a3ef347ed3224e0e463ca42e42a0746d16946bad9.png