Chapter 2: Law of Large Numbers and Central Limit Theorem | Probability Theory and Stochastic Processes

2.1 Weak Law of Large Numbers

📊 Theorem: Weak Law of Large Numbers
For independent and identically distributed (i.i.d.) random variables \(X_1, X_2, \ldots, X_n\) with expectation \(E[X_i] = \mu\) and variance \(Var(X_i) = \sigma^2 < \infty\), the sample mean \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\) converges in probability to \(\mu\): \[\lim_{n \to \infty} P(|\bar{X}_n - \mu| > \epsilon) = 0 \quad \text{for any } \epsilon > 0\] Proof (using Chebyshev's inequality): \[P(|\bar{X}_n - \mu| > \epsilon) \leq \frac{Var(\bar{X}_n)}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2} \to 0 \quad (n \to \infty)\]

💻 Code Example 1: Simulation of Weak Law of Large Numbers

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Parameter settings
np.random.seed(42)
mu = 5  # True expected value
sigma = 2  # True standard deviation
max_n = 10000
epsilon = 0.5  # Tolerance

# Sampling (from normal distribution)
samples = np.random.normal(mu, sigma, max_n)

# Calculate cumulative mean
cumulative_mean = np.cumsum(samples) / np.arange(1, max_n + 1)

# Estimate probability of |X̄_n - μ| > ε
n_values = np.logspace(1, 4, 100).astype(int)
n_values = np.unique(n_values)  # Remove duplicates

prob_exceed = []
for n in n_values:
    # Multiple simulations
    n_trials = 1000
    exceed_count = 0
    for _ in range(n_trials):
        sample_mean = np.random.normal(mu, sigma, n).mean()
        if abs(sample_mean - mu) > epsilon:
            exceed_count += 1
    prob_exceed.append(exceed_count / n_trials)

# Theoretical value (upper bound from Chebyshev's inequality)
theoretical_bound = sigma**2 / (n_values * epsilon**2)

# Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# (1) Convergence of sample mean
axes[0, 0].plot(cumulative_mean, color='#667eea', linewidth=1.5, alpha=0.8)
axes[0, 0].axhline(y=mu, color='red', linestyle='--', linewidth=2, label=f'True mean μ={mu}')
axes[0, 0].axhline(y=mu+epsilon, color='orange', linestyle=':', alpha=0.7, label=f'μ±ε (ε={epsilon})')
axes[0, 0].axhline(y=mu-epsilon, color='orange', linestyle=':', alpha=0.7)
axes[0, 0].set_xlabel('Sample size n', fontsize=11)
axes[0, 0].set_ylabel('Sample mean X̄ₙ', fontsize=11)
axes[0, 0].set_title('Convergence of Sample Mean', fontsize=12, fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)
axes[0, 0].set_xlim([0, max_n])

# (2) Convergence on log scale
axes[0, 1].semilogx(cumulative_mean, color='#667eea', linewidth=1.5, alpha=0.8)
axes[0, 1].axhline(y=mu, color='red', linestyle='--', linewidth=2)
axes[0, 1].axhline(y=mu+epsilon, color='orange', linestyle=':', alpha=0.7)
axes[0, 1].axhline(y=mu-epsilon, color='orange', linestyle=':', alpha=0.7)
axes[0, 1].set_xlabel('Sample size n (log)', fontsize=11)
axes[0, 1].set_ylabel('Sample mean X̄ₙ', fontsize=11)
axes[0, 1].set_title('Convergence of Sample Mean (Log Scale)', fontsize=12, fontweight='bold')
axes[0, 1].grid(alpha=0.3, which='both')

# (3) Estimate of P(|X̄ₙ - μ| > ε)
axes[1, 0].loglog(n_values, prob_exceed, 'o-', color='#667eea',
                   linewidth=2, markersize=4, label='Simulation')
axes[1, 0].loglog(n_values, theoretical_bound, '--', color='red',
                   linewidth=2, label='Chebyshev upper bound')
axes[1, 0].set_xlabel('Sample size n', fontsize=11)
axes[1, 0].set_ylabel('P(|X̄ₙ - μ| > ε)', fontsize=11)
axes[1, 0].set_title('Verification of Weak Law of Large Numbers', fontsize=12, fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3, which='both')

# (4) Comparison across different distributions
distributions = {
    'Normal': lambda n: np.random.normal(mu, sigma, n),
    'Uniform': lambda n: np.random.uniform(mu-sigma*np.sqrt(3), mu+sigma*np.sqrt(3), n),
    'Exponential': lambda n: np.random.exponential(mu, n),
}

for name, dist_func in distributions.items():
    samples_dist = dist_func(max_n)
    cumulative_mean_dist = np.cumsum(samples_dist) / np.arange(1, max_n + 1)
    axes[1, 1].plot(cumulative_mean_dist, linewidth=1.5, alpha=0.7, label=name)

axes[1, 1].axhline(y=mu, color='red', linestyle='--', linewidth=2, label=f'μ={mu}')
axes[1, 1].set_xlabel('Sample size n', fontsize=11)
axes[1, 1].set_ylabel('Sample mean X̄ₙ', fontsize=11)
axes[1, 1].set_title('Law of Large Numbers for Different Distributions', fontsize=12, fontweight='bold')
axes[1, 1].legend(fontsize=9)
axes[1, 1].grid(alpha=0.3)
axes[1, 1].set_xlim([0, 5000])

plt.tight_layout()
plt.show()

print("Verification of Weak Law of Large Numbers:")
print(f"True expected value: μ = {mu}")
print(f"Sample mean at n=100: {cumulative_mean[99]:.4f}")
print(f"Sample mean at n=1000: {cumulative_mean[999]:.4f}")
print(f"Sample mean at n=10000: {cumulative_mean[9999]:.4f}")

Verification of Weak Law of Large Numbers: True expected value: μ = 5 Sample mean at n=100: 5.0234 Sample mean at n=1000: 4.9876 Sample mean at n=10000: 5.0012

2.2 Strong Law of Large Numbers

📊 Theorem: Strong Law of Large Numbers
For independent and identically distributed random variables \(X_1, X_2, \ldots\), if \(E[|X_i|] < \infty\), then the sample mean converges almost surely to the expectation: \[P\left(\lim_{n \to \infty} \bar{X}_n = \mu\right) = 1\] Strong law vs Weak law:

Weak law: For any \(\epsilon\), \(P(|\bar{X}_n - \mu| > \epsilon) \to 0\) (convergence in probability)
Strong law: \(P(\bar{X}_n \to \mu) = 1\) (almost sure convergence)

Almost sure convergence is a stronger condition than convergence in probability.

💻 Code Example 2: Demonstration of Strong Law of Large Numbers

# Simulation with multiple independent paths
np.random.seed(123)
n_paths = 50  # Number of paths
max_n = 5000
mu = 10
sigma = 3

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# (1) Sample mean trajectories for multiple paths
for i in range(n_paths):
    samples = np.random.normal(mu, sigma, max_n)
    cumulative_mean = np.cumsum(samples) / np.arange(1, max_n + 1)
    axes[0, 0].plot(cumulative_mean, alpha=0.3, linewidth=0.8, color='#667eea')

axes[0, 0].axhline(y=mu, color='red', linestyle='--', linewidth=2.5, label=f'μ={mu}')
axes[0, 0].set_xlabel('Sample size n', fontsize=11)
axes[0, 0].set_ylabel('Sample mean X̄ₙ', fontsize=11)
axes[0, 0].set_title(f'{n_paths} Independent Paths (Strong Law of Large Numbers)', fontsize=12, fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)
axes[0, 0].set_ylim([mu-2, mu+2])

# (2) Decay of maximum deviation
n_checkpoints = np.array([10, 50, 100, 500, 1000, 2000, 5000])
max_deviations = []

for n in n_checkpoints:
    deviations = []
    for _ in range(1000):
        samples = np.random.normal(mu, sigma, n)
        sample_mean = samples.mean()
        deviations.append(abs(sample_mean - mu))
    max_deviations.append(np.max(deviations))

axes[0, 1].loglog(n_checkpoints, max_deviations, 'o-', color='#764ba2',
                   linewidth=2, markersize=8, label='Maximum deviation')
axes[0, 1].loglog(n_checkpoints, sigma / np.sqrt(n_checkpoints), '--',
                   color='red', linewidth=2, label='σ/√n (theoretical)')
axes[0, 1].set_xlabel('Sample size n', fontsize=11)
axes[0, 1].set_ylabel('max |X̄ₙ - μ|', fontsize=11)
axes[0, 1].set_title('Decay of Maximum Deviation', fontsize=12, fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3, which='both')

# (3) Confidence bands
n_trials = 500
all_paths = []
for _ in range(n_trials):
    samples = np.random.normal(mu, sigma, max_n)
    cumulative_mean = np.cumsum(samples) / np.arange(1, max_n + 1)
    all_paths.append(cumulative_mean)

all_paths = np.array(all_paths)
percentile_5 = np.percentile(all_paths, 5, axis=0)
percentile_95 = np.percentile(all_paths, 95, axis=0)
median = np.percentile(all_paths, 50, axis=0)

x_axis = np.arange(1, max_n + 1)
axes[1, 0].fill_between(x_axis, percentile_5, percentile_95,
                         alpha=0.3, color='#667eea', label='90% confidence band')
axes[1, 0].plot(median, color='#667eea', linewidth=2, label='Median')
axes[1, 0].axhline(y=mu, color='red', linestyle='--', linewidth=2, label=f'μ={mu}')
axes[1, 0].set_xlabel('Sample size n', fontsize=11)
axes[1, 0].set_ylabel('Sample mean X̄ₙ', fontsize=11)
axes[1, 0].set_title('Confidence Band of Convergence (90%)', fontsize=12, fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)
axes[1, 0].set_ylim([mu-1, mu+1])

# (4) Decay of variance
variance_n = np.var(all_paths, axis=0)
theoretical_var = sigma**2 / x_axis

axes[1, 1].loglog(x_axis[::10], variance_n[::10], 'o', color='#667eea',
                   markersize=3, alpha=0.6, label='Empirical variance')
axes[1, 1].loglog(x_axis, theoretical_var, '--', color='red',
                   linewidth=2, label='σ²/n (theoretical)')
axes[1, 1].set_xlabel('Sample size n', fontsize=11)
axes[1, 1].set_ylabel('Var(X̄ₙ)', fontsize=11)
axes[1, 1].set_title('Decay of Sample Mean Variance', fontsize=12, fontweight='bold')
axes[1, 1].legend()
axes[1, 1].grid(alpha=0.3, which='both')

plt.tight_layout()
plt.show()

print("Demonstration of Strong Law of Large Numbers:")
print(f"Range of sample means at n=5000 over 500 independent trials:")
print(f"  Minimum: {all_paths[:, -1].min():.4f}")
print(f"  Maximum: {all_paths[:, -1].max():.4f}")
print(f"  Median: {np.median(all_paths[:, -1]):.4f}")
print(f"  Standard deviation: {np.std(all_paths[:, -1]):.4f}")

2.3 Central Limit Theorem

📊 Theorem: Central Limit Theorem (CLT)
For independent and identically distributed random variables \(X_1, X_2, \ldots, X_n\) with expectation \(E[X_i] = \mu\) and variance \(Var(X_i) = \sigma^2 < \infty\), the standardized sample mean converges in distribution to the standard normal distribution: \[Z_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sum_{i=1}^n X_i - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} N(0, 1)\] That is: \[\lim_{n \to \infty} P(Z_n \leq z) = \Phi(z) = \int_{-\infty}^z \frac{1}{\sqrt{2\pi}} e^{-t^2/2} dt\] Significance: Regardless of the original distribution, the sample mean approaches a normal distribution.

💻 Code Example 3: Visualization of Central Limit Theorem

# Examine the distribution of sample means from various distributions
np.random.seed(42)

# Original distributions
distributions = {
    'Uniform U(0,1)': {
        'generator': lambda n: np.random.uniform(0, 1, n),
        'mu': 0.5,
        'sigma': 1/np.sqrt(12)
    },
    'Bernoulli (p=0.3)': {
        'generator': lambda n: np.random.binomial(1, 0.3, n),
        'mu': 0.3,
        'sigma': np.sqrt(0.3 * 0.7)
    },
    'Exponential Exp(1)': {
        'generator': lambda n: np.random.exponential(1, n),
        'mu': 1,
        'sigma': 1
    },
    'Beta Beta(0.5,0.5)': {
        'generator': lambda n: np.random.beta(0.5, 0.5, n),
        'mu': 0.5,
        'sigma': 1/(2*np.sqrt(2))
    }
}

sample_sizes = [1, 5, 30, 100]
n_trials = 10000

fig, axes = plt.subplots(len(distributions), len(sample_sizes),
                          figsize=(16, 12))

for i, (dist_name, dist_info) in enumerate(distributions.items()):
    generator = dist_info['generator']
    mu = dist_info['mu']
    sigma = dist_info['sigma']

    for j, n in enumerate(sample_sizes):
        # Calculate sample means
        sample_means = []
        for _ in range(n_trials):
            samples = generator(n)
            sample_means.append(samples.mean())

        sample_means = np.array(sample_means)

        # Standardize
        z_scores = (sample_means - mu) / (sigma / np.sqrt(n))

        # Histogram
        axes[i, j].hist(z_scores, bins=50, density=True, alpha=0.6,
                         color='#667eea', edgecolor='black', linewidth=0.5)

        # Overlay standard normal distribution
        x = np.linspace(-4, 4, 1000)
        axes[i, j].plot(x, stats.norm.pdf(x), 'r-', linewidth=2.5,
                         label='N(0,1)')

        axes[i, j].set_xlim([-4, 4])
        axes[i, j].set_ylim([0, 0.5])

        if j == 0:
            axes[i, j].set_ylabel(dist_name, fontsize=10, fontweight='bold')
        if i == 0:
            axes[i, j].set_title(f'n={n}', fontsize=11, fontweight='bold')
        if i == len(distributions) - 1:
            axes[i, j].set_xlabel('Z = (X̄ₙ - μ)/(σ/√n)', fontsize=9)

        axes[i, j].grid(alpha=0.3)
        axes[i, j].legend(fontsize=8)

plt.suptitle('Visualization of Central Limit Theorem: Verification Across Various Distributions',
             fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

print("Verification of Central Limit Theorem:")
print("="*60)
for dist_name, dist_info in distributions.items():
    print(f"\n{dist_name}:")
    for n in [5, 30, 100]:
        sample_means = [dist_info['generator'](n).mean() for _ in range(n_trials)]
        z_scores = (np.array(sample_means) - dist_info['mu']) / (dist_info['sigma'] / np.sqrt(n))

        print(f"  n={n:3d}: mean={np.mean(z_scores):6.3f}, variance={np.var(z_scores):6.3f}")

2.4 Convergence from Non-Normal to Normal Distributions

💻 Code Example 4: Convergence from Non-Normal to Normal Distributions

# Central limit theorem for extremely non-normal distributions
np.random.seed(42)

# Extremely skewed distribution (log-normal distribution)
mu_log = 0
sigma_log = 1

def lognormal_generator(n):
    return np.random.lognormal(mu_log, sigma_log, n)

# Theoretical statistics for log-normal distribution
true_mean = np.exp(mu_log + sigma_log**2 / 2)
true_var = (np.exp(sigma_log**2) - 1) * np.exp(2*mu_log + sigma_log**2)
true_sigma = np.sqrt(true_var)

sample_sizes = [1, 2, 5, 10, 30, 100]
n_trials = 5000

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, n in enumerate(sample_sizes):
    # Calculate sample means
    sample_means = []
    for _ in range(n_trials):
        samples = lognormal_generator(n)
        sample_means.append(samples.mean())

    sample_means = np.array(sample_means)

    # Standardize
    z_scores = (sample_means - true_mean) / (true_sigma / np.sqrt(n))

    # Data for Q-Q plot
    theoretical_quantiles = stats.norm.ppf(np.linspace(0.01, 0.99, 100))
    sample_quantiles = np.percentile(z_scores, np.linspace(1, 99, 100))

    # Histogram + Q-Q plot
    ax_main = axes[idx]

    # Histogram
    ax_main.hist(z_scores, bins=40, density=True, alpha=0.5,
                  color='#667eea', edgecolor='black', linewidth=0.5)

    # Theoretical normal distribution
    x = np.linspace(-4, 4, 1000)
    ax_main.plot(x, stats.norm.pdf(x), 'r-', linewidth=2.5, label='N(0,1)')

    ax_main.set_xlim([-4, 4])
    ax_main.set_title(f'n={n}', fontsize=12, fontweight='bold')
    ax_main.set_xlabel('Standardized sample mean', fontsize=10)
    ax_main.set_ylabel('Density', fontsize=10)
    ax_main.legend(fontsize=9)
    ax_main.grid(alpha=0.3)

plt.suptitle('Central Limit Theorem for Log-Normal Distribution (Extremely Skewed)',
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Test normality with Kolmogorov-Smirnov test
print("Verification of Central Limit Theorem for Log-Normal Distribution:")
print("="*60)
print(f"Skewness of original distribution: {stats.lognorm.stats(s=sigma_log, moments='s'):.4f}")
print(f"Kurtosis of original distribution: {stats.lognorm.stats(s=sigma_log, moments='k'):.4f}")
print("\nKolmogorov-Smirnov test (normality test):")
for n in sample_sizes:
    sample_means = [lognormal_generator(n).mean() for _ in range(n_trials)]
    z_scores = (np.array(sample_means) - true_mean) / (true_sigma / np.sqrt(n))

    ks_stat, p_value = stats.kstest(z_scores, 'norm')
    print(f"  n={n:3d}: KS statistic={ks_stat:.4f}, p-value={p_value:.4f} "
          f"{'(follows normal distribution)' if p_value > 0.05 else '(significantly different)'}")

2.5 Distribution of Sample Mean and Sample Variance

📊 Theorem: Distributions of Sample Statistics
When \(X_1, \ldots, X_n \sim N(\mu, \sigma^2)\) (i.i.d.): Sample mean: \[\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i \sim N\left(\mu, \frac{\sigma^2}{n}\right)\] Sample variance: \[S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2, \quad \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}\] t-statistic: \[T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \sim t_{n-1}\]

💻 Code Example 5: Distribution of Sample Mean

# Distribution of sample mean
np.random.seed(42)
mu = 50
sigma = 10
sample_sizes = [5, 10, 30, 100]
n_trials = 10000

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

for idx, n in enumerate(sample_sizes):
    # Calculate sample means
    sample_means = []
    for _ in range(n_trials):
        samples = np.random.normal(mu, sigma, n)
        sample_means.append(samples.mean())

    sample_means = np.array(sample_means)

    # Histogram
    axes[idx].hist(sample_means, bins=50, density=True, alpha=0.6,
                    color='#667eea', edgecolor='black', linewidth=0.5,
                    label='Simulation')

    # Theoretical distribution N(μ, σ²/n)
    x = np.linspace(mu - 4*sigma/np.sqrt(n), mu + 4*sigma/np.sqrt(n), 1000)
    theoretical_pdf = stats.norm.pdf(x, mu, sigma/np.sqrt(n))
    axes[idx].plot(x, theoretical_pdf, 'r-', linewidth=2.5,
                    label=f'N({mu}, {sigma**2/n:.2f})')

    axes[idx].axvline(mu, color='green', linestyle='--', linewidth=2,
                       label=f'μ={mu}')
    axes[idx].set_title(f'n={n}', fontsize=12, fontweight='bold')
    axes[idx].set_xlabel('Sample mean X̄', fontsize=11)
    axes[idx].set_ylabel('Density', fontsize=11)
    axes[idx].legend(fontsize=9)
    axes[idx].grid(alpha=0.3)

    # Statistics
    print(f"n={n:3d}: theoretical σ²/n={sigma**2/n:6.2f}, "
          f"empirical Var(X̄)={np.var(sample_means):6.2f}")

plt.suptitle('Distribution of Sample Mean', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

💻 Code Example 6: Distribution of Sample Variance

# Distribution of sample variance
np.random.seed(42)
mu = 0
sigma = 5
n = 10  # Sample size
n_trials = 10000

# Calculate sample variances
sample_variances = []
for _ in range(n_trials):
    samples = np.random.normal(mu, sigma, n)
    sample_var = np.var(samples, ddof=1)  # Unbiased variance (divide by n-1)
    sample_variances.append(sample_var)

sample_variances = np.array(sample_variances)

# Distribution of (n-1)S²/σ² (chi-square distribution)
chi_squared_stat = (n - 1) * sample_variances / sigma**2

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# (1) Distribution of sample variance
axes[0].hist(sample_variances, bins=50, density=True, alpha=0.6,
              color='#667eea', edgecolor='black', linewidth=0.5,
              label='Simulation')

# Theoretical distribution (scaled chi-square distribution)
x_var = np.linspace(0, 80, 1000)
theoretical_var_pdf = stats.chi2.pdf((n-1)*x_var/sigma**2, n-1) * (n-1)/sigma**2
axes[0].plot(x_var, theoretical_var_pdf, 'r-', linewidth=2.5,
              label='Theoretical distribution')

axes[0].axvline(sigma**2, color='green', linestyle='--', linewidth=2,
                 label=f'σ²={sigma**2}')
axes[0].set_xlabel('Sample variance S²', fontsize=11)
axes[0].set_ylabel('Density', fontsize=11)
axes[0].set_title('Distribution of Sample Variance', fontsize=12, fontweight='bold')
axes[0].legend()
axes[0].grid(alpha=0.3)

# (2) Comparison with chi-square distribution
axes[1].hist(chi_squared_stat, bins=50, density=True, alpha=0.6,
              color='#764ba2', edgecolor='black', linewidth=0.5,
              label='Simulation')

x_chi = np.linspace(0, 30, 1000)
theoretical_chi_pdf = stats.chi2.pdf(x_chi, n-1)
axes[1].plot(x_chi, theoretical_chi_pdf, 'r-', linewidth=2.5,
              label=f'χ²({n-1})')

axes[1].set_xlabel('(n-1)S²/σ²', fontsize=11)
axes[1].set_ylabel('Density', fontsize=11)
axes[1].set_title(f'Comparison with Chi-Square Distribution χ²({n-1})', fontsize=12, fontweight='bold')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("Verification of sample variance:")
print(f"Theoretical: E[S²] = σ² = {sigma**2}")
print(f"Empirical: E[S²] = {np.mean(sample_variances):.4f}")
print(f"\nTheoretical: E[(n-1)S²/σ²] = n-1 = {n-1}")
print(f"Empirical: E[(n-1)S²/σ²] = {np.mean(chi_squared_stat):.4f}")

2.6 Application to Materials Science Data

💻 Code Example 7: Application to Materials Science Data

# Real data simulation: Material strength measurement
np.random.seed(42)

# Virtual data: Flexural strength of ceramics (MPa)
# True distribution is Weibull (typical distribution for material strength)
shape_param = 10  # Shape parameter
scale_param = 500  # Characteristic strength (MPa)

true_dist = stats.weibull_min(c=shape_param, scale=scale_param)
true_mean = true_dist.mean()
true_std = true_dist.std()

print("Statistical Analysis of Material Strength Data")
print("="*60)
print(f"True distribution: Weibull distribution (shape={shape_param}, scale={scale_param})")
print(f"True mean: {true_mean:.2f} MPa")
print(f"True standard deviation: {true_std:.2f} MPa")

# Experiments: Measurements with different sample sizes
sample_sizes = [3, 5, 10, 30, 50, 100]
n_experiments = 1000  # Number of experiments

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, n in enumerate(sample_sizes):
    # Calculate sample means for each sample size
    sample_means = []
    for _ in range(n_experiments):
        samples = true_dist.rvs(n)
        sample_means.append(samples.mean())

    sample_means = np.array(sample_means)

    # Histogram
    axes[idx].hist(sample_means, bins=40, density=True, alpha=0.6,
                    color='#667eea', edgecolor='black', linewidth=0.5,
                    label='Measurement results')

    # Approximation by central limit theorem (normal distribution)
    x = np.linspace(sample_means.min(), sample_means.max(), 1000)
    clt_pdf = stats.norm.pdf(x, true_mean, true_std/np.sqrt(n))
    axes[idx].plot(x, clt_pdf, 'r-', linewidth=2.5,
                    label='CLT approximation')

    axes[idx].axvline(true_mean, color='green', linestyle='--',
                       linewidth=2, label=f'True mean')
    axes[idx].set_title(f'Sample size n={n}', fontsize=12, fontweight='bold')
    axes[idx].set_xlabel('Average strength (MPa)', fontsize=10)
    axes[idx].set_ylabel('Density', fontsize=10)
    axes[idx].legend(fontsize=8)
    axes[idx].grid(alpha=0.3)

    # Statistical inference
    sem = true_std / np.sqrt(n)  # Standard error
    ci_95 = stats.norm.interval(0.95, loc=true_mean, scale=sem)

    print(f"\nn={n:3d}:")
    print(f"  Standard error (SEM): {sem:.2f} MPa")
    print(f"  95% confidence interval: [{ci_95[0]:.2f}, {ci_95[1]:.2f}] MPa")
    print(f"  Empirical standard deviation of mean: {np.std(sample_means):.2f} MPa")

plt.suptitle('Application of Central Limit Theorem in Material Strength Measurement',
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Practical inference
print("\n\n[Practical Inference]")
print("="*60)
n_test = 10  # Actual number of test specimens
test_samples = true_dist.rvs(n_test)
test_mean = test_samples.mean()
test_std = test_samples.std(ddof=1)

print(f"Number of test specimens: {n_test}")
print(f"Measured average strength: {test_mean:.2f} MPa")
print(f"Measured standard deviation: {test_std:.2f} MPa")

# Confidence interval using t-distribution (true variance unknown)
t_critical = stats.t.ppf(0.975, n_test - 1)
ci_lower = test_mean - t_critical * test_std / np.sqrt(n_test)
ci_upper = test_mean + t_critical * test_std / np.sqrt(n_test)

print(f"\n95% confidence interval (t-distribution): [{ci_lower:.2f}, {ci_upper:.2f}] MPa")
print(f"True mean {true_mean:.2f} MPa is {'included in' if ci_lower <= true_mean <= ci_upper else 'not included in'} the confidence interval")

💡 Note: In materials science, it is necessary to estimate population characteristics from a limited number of test specimens. The central limit theorem enables t-tests and construction of confidence intervals even when the original distribution (such as Weibull distribution) is not normal, because the sample mean follows a normal distribution.

Exercises

📝 Exercise 1: Verification of Law of Large Numbers
Simulate a dice rolling experiment and verify the following:

Visualize that the sample mean of the outcomes converges to the theoretical value of 3.5
Calculate the upper bound of the probability that |X̄ₙ - 3.5| > 0.2 using Chebyshev's inequality
Compare the convergence rates for different sample sizes (n=10, 100, 1000)

📝 Exercise 2: Application of Central Limit Theorem
Take n samples independently from a uniform distribution U(0, 10) and calculate the sample mean:

Visualize the distribution of sample means with histograms for n=5, 10, 30, 100
Verify that the standardized sample mean follows a standard normal distribution using Q-Q plots
Statistically verify normality using the Kolmogorov-Smirnov test

📝 Exercise 3: Construction of Confidence Intervals
Take a sample of n=25 from a normal distribution N(100, 15²) and calculate the following:

Construct a 95% confidence interval for the sample mean (when population variance is known)
Calculate the sample mean and sample variance, and construct a 95% confidence interval using the t-distribution (when population variance is unknown)
Perform 1000 independent simulations and verify the proportion of times the confidence interval contains the true mean

← Chapter 1 Chapter 3 →