1.1 Discrete Random Variables and Probability Mass Functions
📐 Definition: Discrete Random Variable
A discrete random variable \(X\) is a random variable that takes countable values \(\{x_1, x_2, \ldots\}\).
The Probability Mass Function (PMF) is: \[P(X = x_i) = p_i, \quad \sum_{i} p_i = 1\] It represents the probability of occurrence of each value.
A discrete random variable \(X\) is a random variable that takes countable values \(\{x_1, x_2, \ldots\}\).
The Probability Mass Function (PMF) is: \[P(X = x_i) = p_i, \quad \sum_{i} p_i = 1\] It represents the probability of occurrence of each value.
💻 Code Example 1: Visualization of Discrete Probability Distribution
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Dice example (uniform discrete distribution)
outcomes = np.arange(1, 7) # 1 to 6
probabilities = np.ones(6) / 6 # Probability for each face is 1/6
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# PMF bar graph
axes[0].bar(outcomes, probabilities, color='#667eea', alpha=0.7, edgecolor='black')
axes[0].set_xlabel('Outcome (x)', fontsize=12)
axes[0].set_ylabel('Probability P(X=x)', fontsize=12)
axes[0].set_title('Probability Mass Function (PMF) of a Die', fontsize=14, fontweight='bold')
axes[0].set_xticks(outcomes)
axes[0].grid(axis='y', alpha=0.3)
# Cumulative Distribution Function (CDF)
cdf = np.cumsum(probabilities)
axes[1].step(outcomes, cdf, where='post', color='#764ba2', linewidth=2)
axes[1].scatter(outcomes, cdf, color='#764ba2', s=50, zorder=5)
axes[1].set_xlabel('Outcome (x)', fontsize=12)
axes[1].set_ylabel('Cumulative Probability P(X≤x)', fontsize=12)
axes[1].set_title('Cumulative Distribution Function (CDF)', fontsize=14, fontweight='bold')
axes[1].set_xticks(outcomes)
axes[1].grid(alpha=0.3)
axes[1].set_ylim([0, 1.1])
plt.tight_layout()
plt.show()
# Calculation of expectation and variance
E_X = np.sum(outcomes * probabilities)
E_X2 = np.sum(outcomes**2 * probabilities)
Var_X = E_X2 - E_X**2
print("Statistics of a Die:")
print(f"Expectation E[X] = {E_X:.4f}")
print(f"Variance Var(X) = {Var_X:.4f}")
print(f"Standard Deviation σ = {np.sqrt(Var_X):.4f}")
Statistics of a Die:
Expectation E[X] = 3.5000
Variance Var(X) = 2.9167
Standard Deviation σ = 1.7078
1.2 Continuous Random Variables and Probability Density Functions
📐 Definition: Continuous Random Variable
A continuous random variable \(X\) takes real values and is characterized by a Probability Density Function (PDF) \(f(x)\): \[P(a \leq X \leq b) = \int_a^b f(x) \, dx, \quad \int_{-\infty}^{\infty} f(x) \, dx = 1\] The Cumulative Distribution Function (CDF) is: \[F(x) = P(X \leq x) = \int_{-\infty}^x f(t) \, dt\]
A continuous random variable \(X\) takes real values and is characterized by a Probability Density Function (PDF) \(f(x)\): \[P(a \leq X \leq b) = \int_a^b f(x) \, dx, \quad \int_{-\infty}^{\infty} f(x) \, dx = 1\] The Cumulative Distribution Function (CDF) is: \[F(x) = P(X \leq x) = \int_{-\infty}^x f(t) \, dt\]
💻 Code Example 2: Continuous Probability Distribution and PDF/CDF
# Example of uniform distribution U(0, 1)
x = np.linspace(-0.5, 1.5, 1000)
uniform_dist = stats.uniform(loc=0, scale=1)
pdf = uniform_dist.pdf(x)
cdf = uniform_dist.cdf(x)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# PDF
axes[0].plot(x, pdf, color='#667eea', linewidth=2.5, label='PDF')
axes[0].fill_between(x, 0, pdf, where=(x >= 0.3) & (x <= 0.7),
alpha=0.3, color='#764ba2', label='P(0.3 ≤ X ≤ 0.7)')
axes[0].set_xlabel('x', fontsize=12)
axes[0].set_ylabel('f(x)', fontsize=12)
axes[0].set_title('Probability Density Function of Uniform Distribution U(0,1)', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(alpha=0.3)
axes[0].set_ylim([0, 1.5])
# CDF
axes[1].plot(x, cdf, color='#764ba2', linewidth=2.5)
axes[1].axhline(y=0.7, color='red', linestyle='--', alpha=0.5, label='P(X≤0.7) = 0.7')
axes[1].axvline(x=0.7, color='red', linestyle='--', alpha=0.5)
axes[1].set_xlabel('x', fontsize=12)
axes[1].set_ylabel('F(x)', fontsize=12)
axes[1].set_title('Cumulative Distribution Function (CDF)', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Probability calculation
prob = uniform_dist.cdf(0.7) - uniform_dist.cdf(0.3)
print(f"P(0.3 ≤ X ≤ 0.7) = {prob:.4f}")
P(0.3 ≤ X ≤ 0.7) = 0.4000
1.3 Expectation, Variance, and Moments
📊 Theorem: Expectation and Variance
Expectation (Mean): \[E[X] = \sum_i x_i p_i \quad (\text{discrete}), \quad E[X] = \int_{-\infty}^{\infty} x f(x) \, dx \quad (\text{continuous})\] Variance: \[Var(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2\] k-th Moment: \[E[X^k] = \sum_i x_i^k p_i \quad (\text{discrete}), \quad E[X^k] = \int_{-\infty}^{\infty} x^k f(x) \, dx \quad (\text{continuous})\]
Expectation (Mean): \[E[X] = \sum_i x_i p_i \quad (\text{discrete}), \quad E[X] = \int_{-\infty}^{\infty} x f(x) \, dx \quad (\text{continuous})\] Variance: \[Var(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2\] k-th Moment: \[E[X^k] = \sum_i x_i^k p_i \quad (\text{discrete}), \quad E[X^k] = \int_{-\infty}^{\infty} x^k f(x) \, dx \quad (\text{continuous})\]
💻 Code Example 3: Calculation of Expectation and Variance
# Calculate expectation and variance for multiple distributions
distributions = {
'Uniform U(0,1)': stats.uniform(0, 1),
'Normal N(5,2²)': stats.norm(5, 2),
'Exponential Exp(λ=1)': stats.expon(scale=1),
'Beta Beta(2,5)': stats.beta(2, 5)
}
print("Expectation and Variance of Each Distribution:\n" + "="*50)
for name, dist in distributions.items():
mean = dist.mean()
var = dist.var()
std = dist.std()
# Calculate k-th moments (k=3,4)
if hasattr(dist, 'moment'):
third_moment = dist.moment(3)
fourth_moment = dist.moment(4)
else:
third_moment = np.nan
fourth_moment = np.nan
print(f"\n{name}:")
print(f" Expectation E[X] = {mean:.4f}")
print(f" Variance Var(X) = {var:.4f}")
print(f" Standard Deviation σ = {std:.4f}")
# Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()
x_ranges = [
np.linspace(0, 1, 500),
np.linspace(-1, 11, 500),
np.linspace(0, 6, 500),
np.linspace(0, 1, 500)
]
for i, ((name, dist), x_range) in enumerate(zip(distributions.items(), x_ranges)):
pdf = dist.pdf(x_range)
mean = dist.mean()
std = dist.std()
axes[i].plot(x_range, pdf, color='#667eea', linewidth=2.5, label='PDF')
axes[i].axvline(mean, color='red', linestyle='--', linewidth=2, label=f'E[X]={mean:.2f}')
axes[i].axvline(mean - std, color='orange', linestyle=':', alpha=0.7, label=f'σ={std:.2f}')
axes[i].axvline(mean + std, color='orange', linestyle=':', alpha=0.7)
axes[i].fill_between(x_range, 0, pdf, alpha=0.2, color='#764ba2')
axes[i].set_xlabel('x', fontsize=11)
axes[i].set_ylabel('f(x)', fontsize=11)
axes[i].set_title(name, fontsize=12, fontweight='bold')
axes[i].legend(fontsize=9)
axes[i].grid(alpha=0.3)
plt.tight_layout()
plt.show()
1.4 Binomial and Poisson Distributions
📐 Definition: Binomial and Poisson Distributions
Binomial Distribution \(B(n, p)\): Distribution of the number of successes X in n independent trials with success probability p \[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad E[X] = np, \quad Var(X) = np(1-p)\] Poisson Distribution \(Pois(\lambda)\): When the average number of occurrences per unit time is λ \[P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad E[X] = Var(X) = \lambda\] The Poisson distribution is obtained as the limit of the binomial distribution (\(n \to \infty, p \to 0, np = \lambda\)).
Binomial Distribution \(B(n, p)\): Distribution of the number of successes X in n independent trials with success probability p \[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad E[X] = np, \quad Var(X) = np(1-p)\] Poisson Distribution \(Pois(\lambda)\): When the average number of occurrences per unit time is λ \[P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad E[X] = Var(X) = \lambda\] The Poisson distribution is obtained as the limit of the binomial distribution (\(n \to \infty, p \to 0, np = \lambda\)).
💻 Code Example 4: Comparison of Binomial and Poisson Distributions
# Comparison of binomial and Poisson distributions
n = 100 # Number of trials
p = 0.03 # Success probability
lam = n * p # λ = np = 3
binomial = stats.binom(n, p)
poisson = stats.poisson(lam)
k = np.arange(0, 15)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# PMF comparison
axes[0].bar(k - 0.2, binomial.pmf(k), width=0.4, alpha=0.7,
color='#667eea', label=f'Binomial B({n},{p})')
axes[0].bar(k + 0.2, poisson.pmf(k), width=0.4, alpha=0.7,
color='#764ba2', label=f'Poisson Pois({lam})')
axes[0].set_xlabel('k (Number of Successes)', fontsize=12)
axes[0].set_ylabel('Probability P(X=k)', fontsize=12)
axes[0].set_title('Comparison of Binomial and Poisson Distributions', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(axis='y', alpha=0.3)
# Visualization of error
error = np.abs(binomial.pmf(k) - poisson.pmf(k))
axes[1].bar(k, error, color='red', alpha=0.6)
axes[1].set_xlabel('k', fontsize=12)
axes[1].set_ylabel('|P_binomial - P_poisson|', fontsize=12)
axes[1].set_title('Approximation Error', fontsize=14, fontweight='bold')
axes[1].grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
# Comparison of statistics
print("Comparison of Statistics:")
print(f"Binomial: E[X]={binomial.mean():.4f}, Var(X)={binomial.var():.4f}")
print(f"Poisson: E[X]={poisson.mean():.4f}, Var(X)={poisson.var():.4f}")
# Application example in materials science: Defect occurrence probability
print("\n[Application Example in Materials Science]")
print("When an average of 3 defects occur per cm² in a thin film deposition process:")
area = 5 # cm²
lam_defects = 3 * area
poisson_defects = stats.poisson(lam_defects)
print(f"Probability of 10 or fewer defects in a 5cm² area: {poisson_defects.cdf(10):.4f}")
print(f"Probability of 15 or more defects in a 5cm² area: {1 - poisson_defects.cdf(14):.4f}")
Comparison of Statistics:
Binomial: E[X]=3.0000, Var(X)=2.9100
Poisson: E[X]=3.0000, Var(X)=3.0000
[Application Example in Materials Science]
When an average of 3 defects occur per cm² in a thin film deposition process:
Probability of 10 or fewer defects in a 5cm² area: 0.1185
Probability of 15 or more defects in a 5cm² area: 0.5289
1.5 Normal Distribution (Gaussian Distribution)
📐 Definition: Normal Distribution
Probability density function of the normal distribution \(N(\mu, \sigma^2)\): \[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\] Expectation \(E[X] = \mu\), Variance \(Var(X) = \sigma^2\)
Standard Normal Distribution: \(N(0, 1)\) is a normal distribution with \(\mu=0, \sigma=1\)
Any normal distribution can be transformed to a standard normal distribution through the standardization transformation \(Z = \frac{X - \mu}{\sigma}\).
Probability density function of the normal distribution \(N(\mu, \sigma^2)\): \[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\] Expectation \(E[X] = \mu\), Variance \(Var(X) = \sigma^2\)
Standard Normal Distribution: \(N(0, 1)\) is a normal distribution with \(\mu=0, \sigma=1\)
Any normal distribution can be transformed to a standard normal distribution through the standardization transformation \(Z = \frac{X - \mu}{\sigma}\).
💻 Code Example 5: Properties and Applications of Normal Distribution
# Visualization of normal distribution
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# (1) Effect of different μ
x = np.linspace(-10, 10, 1000)
mus = [0, 2, -2]
sigma = 1
axes[0, 0].set_title('Effect of Mean μ (σ=1 fixed)', fontsize=12, fontweight='bold')
for mu in mus:
dist = stats.norm(mu, sigma)
axes[0, 0].plot(x, dist.pdf(x), linewidth=2, label=f'μ={mu}')
axes[0, 0].set_xlabel('x', fontsize=11)
axes[0, 0].set_ylabel('f(x)', fontsize=11)
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)
# (2) Effect of different σ
mu = 0
sigmas = [0.5, 1, 2]
axes[0, 1].set_title('Effect of Standard Deviation σ (μ=0 fixed)', fontsize=12, fontweight='bold')
for sigma in sigmas:
dist = stats.norm(mu, sigma)
axes[0, 1].plot(x, dist.pdf(x), linewidth=2, label=f'σ={sigma}')
axes[0, 1].set_xlabel('x', fontsize=11)
axes[0, 1].set_ylabel('f(x)', fontsize=11)
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)
# (3) 68-95-99.7 Rule (3σ Rule)
mu, sigma = 0, 1
dist = stats.norm(mu, sigma)
x_range = np.linspace(-4, 4, 1000)
axes[1, 0].plot(x_range, dist.pdf(x_range), 'black', linewidth=2)
axes[1, 0].fill_between(x_range, 0, dist.pdf(x_range),
where=(x_range >= -sigma) & (x_range <= sigma),
alpha=0.3, color='green', label='68% (±1σ)')
axes[1, 0].fill_between(x_range, 0, dist.pdf(x_range),
where=((x_range >= -2*sigma) & (x_range < -sigma)) |
((x_range > sigma) & (x_range <= 2*sigma)),
alpha=0.3, color='yellow', label='95% (±2σ)')
axes[1, 0].fill_between(x_range, 0, dist.pdf(x_range),
where=((x_range >= -3*sigma) & (x_range < -2*sigma)) |
((x_range > 2*sigma) & (x_range <= 3*sigma)),
alpha=0.3, color='orange', label='99.7% (±3σ)')
axes[1, 0].set_title('68-95-99.7 Rule (3σ Rule)', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('x', fontsize=11)
axes[1, 0].set_ylabel('f(x)', fontsize=11)
axes[1, 0].legend(fontsize=9)
axes[1, 0].grid(alpha=0.3)
# (4) Standardization and percentiles
standard_normal = stats.norm(0, 1)
percentiles = [0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99]
z_values = [standard_normal.ppf(p) for p in percentiles]
axes[1, 1].plot(x_range, standard_normal.pdf(x_range), 'black', linewidth=2)
for p, z in zip(percentiles, z_values):
axes[1, 1].axvline(z, color='red', linestyle='--', alpha=0.5)
axes[1, 1].text(z, 0.4, f'{int(p*100)}%', fontsize=8, ha='center')
axes[1, 1].set_title('Percentiles of Standard Normal Distribution', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('z', fontsize=11)
axes[1, 1].set_ylabel('φ(z)', fontsize=11)
axes[1, 1].grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Probability calculation
print("Probability Calculation in Normal Distribution N(100, 15²):")
mu, sigma = 100, 15
exam_scores = stats.norm(mu, sigma)
print(f"P(X ≥ 120) = {1 - exam_scores.cdf(120):.4f}")
print(f"P(85 ≤ X ≤ 115) = {exam_scores.cdf(115) - exam_scores.cdf(85):.4f}")
# Standardization
x_value = 120
z_score = (x_value - mu) / sigma
print(f"\nz-score for X=120: z = {z_score:.4f}")
print(f"P(Z ≥ {z_score:.4f}) = {1 - standard_normal.cdf(z_score):.4f}")
1.6 Exponential Distribution and Waiting Time Problems
📐 Definition: Exponential Distribution
The exponential distribution \(Exp(\lambda)\) represents the waiting time between events in a Poisson process: \[f(x) = \lambda e^{-\lambda x} \quad (x \geq 0), \quad F(x) = 1 - e^{-\lambda x}\] Expectation \(E[X] = \frac{1}{\lambda}\), Variance \(Var(X) = \frac{1}{\lambda^2}\)
Memoryless Property: \(P(X > s+t \mid X > s) = P(X > t)\) (past information does not affect the future)
The exponential distribution \(Exp(\lambda)\) represents the waiting time between events in a Poisson process: \[f(x) = \lambda e^{-\lambda x} \quad (x \geq 0), \quad F(x) = 1 - e^{-\lambda x}\] Expectation \(E[X] = \frac{1}{\lambda}\), Variance \(Var(X) = \frac{1}{\lambda^2}\)
Memoryless Property: \(P(X > s+t \mid X > s) = P(X > t)\) (past information does not affect the future)
💻 Code Example 6: Exponential Distribution and Waiting Time Problems
# Simulation of exponential distribution
lambda_rate = 0.5 # Average arrival rate (0.5 times per hour = once every 2 hours)
exp_dist = stats.expon(scale=1/lambda_rate)
# Visualization of PDF and CDF
x = np.linspace(0, 10, 1000)
pdf = exp_dist.pdf(x)
cdf = exp_dist.cdf(x)
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
# (1) PDF and CDF
axes[0].plot(x, pdf, color='#667eea', linewidth=2.5, label='PDF')
axes[0].fill_between(x, 0, pdf, alpha=0.2, color='#667eea')
axes[0].set_xlabel('Time (hours)', fontsize=11)
axes[0].set_ylabel('f(x)', fontsize=11)
axes[0].set_title('Probability Density Function of Exponential Distribution', fontsize=12, fontweight='bold')
axes[0].grid(alpha=0.3)
axes[0].legend()
axes[1].plot(x, cdf, color='#764ba2', linewidth=2.5, label='CDF')
axes[1].axhline(y=0.5, color='red', linestyle='--', alpha=0.5, label='Median')
median = exp_dist.median()
axes[1].axvline(x=median, color='red', linestyle='--', alpha=0.5)
axes[1].set_xlabel('Time (hours)', fontsize=11)
axes[1].set_ylabel('F(x)', fontsize=11)
axes[1].set_title('Cumulative Distribution Function', fontsize=12, fontweight='bold')
axes[1].grid(alpha=0.3)
axes[1].legend()
# (2) Verification of memoryless property
# Verify that P(X > 3+2 | X > 3) = P(X > 2)
t = 2
s = 3
prob_uncond = 1 - exp_dist.cdf(t) # P(X > 2)
prob_cond = (1 - exp_dist.cdf(s+t)) / (1 - exp_dist.cdf(s)) # P(X > 5 | X > 3)
print("Verification of Memoryless Property:")
print(f"P(X > {t}) = {prob_uncond:.6f}")
print(f"P(X > {s+t} | X > {s}) = {prob_cond:.6f}")
print(f"Difference: {abs(prob_uncond - prob_cond):.10f}")
# (3) Simulation
np.random.seed(42)
n_simulations = 10000
samples = exp_dist.rvs(size=n_simulations)
axes[2].hist(samples, bins=50, density=True, alpha=0.6,
color='#764ba2', edgecolor='black', label='Simulation')
axes[2].plot(x, pdf, color='#667eea', linewidth=2.5, label='Theoretical PDF')
axes[2].set_xlabel('Time (hours)', fontsize=11)
axes[2].set_ylabel('Density', fontsize=11)
axes[2].set_title(f'Simulation (n={n_simulations})', fontsize=12, fontweight='bold')
axes[2].legend()
axes[2].grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Statistics
print(f"\nTheoretical: E[X]={exp_dist.mean():.4f}, Var(X)={exp_dist.var():.4f}")
print(f"Observed: E[X]={np.mean(samples):.4f}, Var(X)={np.var(samples):.4f}")
# Application in materials science: Equipment failure intervals
print("\n[Application Example in Materials Science]")
print("When the failure interval of manufacturing equipment follows exponential distribution Exp(λ=0.1 [1/day]):")
lambda_failure = 0.1
failure_dist = stats.expon(scale=1/lambda_failure)
print(f"Mean Time Between Failures (MTBF): {failure_dist.mean():.2f} days")
print(f"Probability of failure within 7 days: {failure_dist.cdf(7):.4f}")
print(f"Probability of no failure for 30 days or more: {1 - failure_dist.cdf(30):.4f}")
Verification of Memoryless Property:
P(X > 2) = 0.367879
P(X > 5 | X > 3) = 0.367879
Difference: 0.0000000000
Theoretical: E[X]=2.0000, Var(X)=4.0000
Observed: E[X]=2.0124, Var(X)=4.0346
[Application Example in Materials Science]
When the failure interval of manufacturing equipment follows exponential distribution Exp(λ=0.1 [1/day]):
Mean Time Between Failures (MTBF): 10.00 days
Probability of failure within 7 days: 0.5034
Probability of no failure for 30 days or more: 0.0498
1.7 Utilizing Statistical Distributions with scipy.stats
💻 Code Example 7: Utilizing Statistical Distributions with scipy.stats
# Various distributions available in scipy.stats
from scipy.stats import (
bernoulli, binom, poisson, geom, # Discrete distributions
uniform, norm, expon, gamma, beta, weibull_min # Continuous distributions
)
# Unified interface
distributions = {
'Bernoulli': bernoulli(p=0.3),
'Binomial': binom(n=10, p=0.3),
'Poisson': poisson(mu=3),
'Geometric': geom(p=0.3),
'Uniform': uniform(loc=0, scale=1),
'Normal': norm(loc=0, scale=1),
'Exponential': expon(scale=2),
'Gamma': gamma(a=2, scale=2),
'Beta': beta(a=2, b=5),
'Weibull': weibull_min(c=1.5, scale=1)
}
print("Statistical Distribution Operations with scipy.stats:\n" + "="*60)
# Key methods for each distribution
example_dist = norm(loc=5, scale=2)
print("\nKey Methods (Example with Normal Distribution N(5, 2²)):")
print(f" .mean() Mean: {example_dist.mean():.4f}")
print(f" .var() Variance: {example_dist.var():.4f}")
print(f" .std() Standard Deviation: {example_dist.std():.4f}")
print(f" .pdf(6) Probability Density (x=6): {example_dist.pdf(6):.6f}")
print(f" .cdf(7) Cumulative Probability (x≤7): {example_dist.cdf(7):.6f}")
print(f" .ppf(0.95) 95th Percentile: {example_dist.ppf(0.95):.4f}")
print(f" .rvs(5) Random Number Generation (5 samples): {example_dist.rvs(5)}")
# Characteristics of each distribution
print("\n\nStatistics for Each Distribution:")
print(f"{'Distribution':<12} {'Mean':>10} {'Variance':>10} {'Skewness':>10} {'Kurtosis':>10}")
print("="*60)
for name, dist in distributions.items():
try:
mean = dist.mean()
var = dist.var()
skew = dist.stats(moments='s')
kurt = dist.stats(moments='k')
print(f"{name:<12} {mean:>10.4f} {var:>10.4f} {skew:>10.4f} {kurt:>10.4f}")
except:
print(f"{name:<12} (Cannot calculate)")
# Calculation of quantiles (confidence intervals, etc.)
print("\n\nQuantiles of Standard Normal Distribution N(0,1):")
alpha_levels = [0.90, 0.95, 0.99]
standard_normal = norm(0, 1)
for alpha in alpha_levels:
lower = standard_normal.ppf((1-alpha)/2)
upper = standard_normal.ppf((1+alpha)/2)
print(f"{int(alpha*100)}% Confidence Interval: [{lower:.4f}, {upper:.4f}]")
# Goodness-of-fit test (Kolmogorov-Smirnov test)
from scipy.stats import kstest
np.random.seed(42)
sample_data = norm(loc=10, scale=2).rvs(1000)
# Hypothesis: Data follows N(10, 2²)
ks_stat, p_value = kstest(sample_data, lambda x: norm(10, 2).cdf(x))
print(f"\n\nKolmogorov-Smirnov Goodness-of-Fit Test:")
print(f"Test Statistic: {ks_stat:.6f}")
print(f"p-value: {p_value:.6f}")
print(f"Conclusion: {'Cannot reject null hypothesis (distribution fits)' if p_value > 0.05 else 'Reject null hypothesis (distribution does not fit)'}")
# Application in materials science: Weibull distribution analysis of material strength
print("\n\n[Application Example in Materials Science]Strength Distribution of Ceramics (Weibull Distribution):")
shape_param = 10 # Shape parameter (larger value means more uniform)
scale_param = 500 # Scale parameter (characteristic strength)
weibull = weibull_min(c=shape_param, scale=scale_param)
print(f"Weibull Distribution: shape={shape_param}, scale={scale_param} MPa")
print(f"Mean Strength: {weibull.mean():.2f} MPa")
print(f"Standard Deviation: {weibull.std():.2f} MPa")
print(f"Probability of Fracture at 450 MPa or Below: {weibull.cdf(450):.4f}")
print(f"Design Strength (99% Reliability): {weibull.ppf(0.01):.2f} MPa")
💡 Note: scipy.stats provides over 70 probability distributions and operates them with a unified interface (pdf, cdf, ppf, rvs, mean, var, etc.). In materials science, Weibull distribution (strength of brittle materials), lognormal distribution (particle size distribution), and gamma distribution (waiting time) are commonly used.
Exercises
📝 Exercise 1: Statistical Distributions in Quality Control
When the defect rate of a product is 2%, for a lot of 100 products:
When the defect rate of a product is 2%, for a lot of 100 products:
- Model the distribution of defective products using binomial and Poisson distributions and compare them
- Calculate the probability of having 3 or more defective products
- Find the probability that the number of defective products falls within the range of expectation ± 1 standard deviation
📝 Exercise 2: Measurement Error Analysis Using Normal Distribution
When measurement equipment error follows normal distribution N(0, 0.5²):
When measurement equipment error follows normal distribution N(0, 0.5²):
- Calculate the probability that the error is within ±1 mm
- Find the 95% confidence interval
- Calculate the probability that the absolute value of the error exceeds 0.8 mm
📝 Exercise 3: Failure Analysis Using Exponential Distribution
When the lifespan of electronic components follows exponential distribution Exp(λ=0.001 [1/hour]):
When the lifespan of electronic components follows exponential distribution Exp(λ=0.001 [1/hour]):
- Calculate the mean lifespan (MTBF)
- Find the probability of operating for 1000 hours or more
- Using the memoryless property, calculate the probability that a component that has already operated for 500 hours will operate for an additional 1000 hours or more