🌐 EN | 🇯🇵 JP | Last sync: 2025-11-16

Chapter 3: Image Data Analysis

Automated Analysis of SEM/TEM Images - From Particle Detection to Deep Learning

📖 Reading time: 30-35 minutes 📊 Level: Intermediate 💻 Code examples: 13 📝 Exercises: 3

Chapter 3: Image Data Analysis

Implement a basic pipeline for extracting particle information from SEM/TEM images. Tips for robustness against noise and overlapping are also introduced.

💡 Note: The workflow is fixed in the order: preprocessing (smoothing, binarization) → feature extraction → post-processing. The quality of annotations determines performance.

Automated Analysis of SEM/TEM Images - From Particle Detection to Deep Learning

Learning Objectives

By completing this chapter, you will be able to:

Reading time: 30-35minutes Code examples: 13 Exercises: 3


3.1 Characteristics of Image Data and Preprocessing Strategy

Characteristics of SEM/TEM Images

Electron microscopy images are powerful tools for visualizing nano- to micro-scale structures of materials.

Measurement Technique Spatial Resolution Typical Field of View Main Information Image Characteristics
SEM Several nm to several μm 10μm to 1mm Surface morphology, microstructure Deep depth of field, shadow effects
TEM Atomic level Tens of nm to several μm Internal structure, crystallinity High contrast, diffraction patterns
STEM Sub-nm Tens to hundreds of nm Atomic arrangement, elemental distribution Atomic resolution

Typical Workflow for Image Analysis

flowchart TD A[Image Acquisition] --> B[Preprocessing] B --> C[Segmentation] C --> D[Feature Extraction] D --> E[Quantitative Analysis] E --> F[Statistical Processing] F --> G[Visualization & Reporting] B --> B1[Noise Removal] B --> B2[Contrast Adjustment] B --> B3[Binarization] style A fill:#e3f2fd style B fill:#fff3e0 style C fill:#f3e5f5 style D fill:#e8f5e9 style E fill:#fce4ec style G fill:#fff9c4

3.2 Data Licensing and Reproducibility

Image Data Repositories and Licensing

Utilization of microscopy image databases is essential for algorithm development and validation.

Major Image Databases

Database Content License Access Citation Requirements
EMPIAR (Electron Microscopy) TEM/cryo-EM images CC BY 4.0 Free Required
NanoMine Nanocomposite SEM CC BY 4.0 Free Recommended
Materials Data Facility Materials image datasets Mixed Free Required
Kaggle Datasets (Microscopy) Various microscopy images Mixed Free Verification required
NIST SRD Standard reference images Public Domain Free Recommended

Notes on Data Usage

Example of Using Public Data:

"""
SEM image from NanoMine database.
Reference: NanoMine Dataset L123 - Polymer Nanocomposite
Citation: Zhao, H. et al. (2016) Computational Materials Science
License: CC BY 4.0
URL: https://materialsmine.org/nm/L123
Scale bar: 500 nm (calibration: 2.5 nm/pixel)
"""

Recording Image Metadata:

IMAGE_METADATA = {
    'instrument': 'FEI Quanta 200',
    'accelerating_voltage': '20 kV',
    'magnification': '10000x',
    'working_distance': '10 mm',
    'pixel_size': 2.5,  # nm/pixel
    'scale_bar': 500,   # nm
    'detector': 'SE detector',
    'acquisition_date': '2025-10-15'
}

# Save metadata to JSON (ensuring reproducibility)
import json
with open('image_metadata.json', 'w') as f:
    json.dump(IMAGE_METADATA, f, indent=2)

Best Practices for Code Reproducibility

Recording Environment Information

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0
# - tensorflow>=2.13.0, <2.16.0

"""
Example: Recording Environment Information

Purpose: Demonstrate core concepts and implementation patterns
Target: Advanced
Execution time: ~5 seconds
Dependencies: None
"""

import sys
import cv2
import numpy as np
from skimage import __version__ as skimage_version
import tensorflow as tf

print("=== Image Analysis Environment ===")
print(f"Python: {sys.version}")
print(f"OpenCV: {cv2.__version__}")
print(f"NumPy: {np.__version__}")
print(f"scikit-image: {skimage_version}")
print(f"TensorFlow: {tf.__version__}")

# Recommended Versions (as of October 2025):
# - Python: 3.10 or higher
# - OpenCV: 4.8 or higher
# - NumPy: 1.24 or higher
# - scikit-image: 0.21 or higher
# - TensorFlow: 2.13 or higher (GPU version recommended)

Parameter Documentation

Bad Example(not reproducible):

denoised = cv2.fastNlMeansDenoising(image, None, 10, 7, 21)  # Why these values?

Good Example(reproducible):

# Non-Local MeansParameter Settings
NLM_H = 10  # Filter strength (corresponding to noise level, typical SEM values: 5-15)
NLM_TEMPLATE_WINDOW = 7  # Template window size (odd number, recommended: 7)
NLM_SEARCH_WINDOW = 21   # Search window size (odd number, recommended: 21)
NLM_DESCRIPTION = """
h: Adjust according to noise level. Low noise: 5-7, high noise: 10-15
templateWindowSize: 7 is standard (Smaller is faster but leaves residual noise)
searchWindowSize: 21 is standard (Larger is higher quality but slower)
"""
denoised = cv2.fastNlMeansDenoising(
    image, None, NLM_H, NLM_TEMPLATE_WINDOW, NLM_SEARCH_WINDOW
)

Recording Image Calibration

# Pixel-to-physical size conversion parameters
CALIBRATION_PARAMS = {
    'pixel_size_nm': 2.5,  # nm/pixel(Calibrated from scale bar)
    'scale_bar_length_nm': 500,  # Physical size of scale bar
    'scale_bar_pixels': 200,  # Number of pixels in scale bar
    'calibration_date': '2025-10-15',
    'calibration_method': 'Scale bar measurement',
    'uncertainty': 0.1  # nm/pixel(Calibration uncertainty)
}

# Convert pixels to physical size
def pixels_to_nm(pixels, calib_params=CALIBRATION_PARAMS):
    """Convert pixel count to physical size (nm)"""
    return pixels * calib_params['pixel_size_nm']

# Usage Example
diameter_pixels = 50
diameter_nm = pixels_to_nm(diameter_pixels)
print(f"Particle diameter: {diameter_nm:.1f} ± {CALIBRATION_PARAMS['uncertainty']*diameter_pixels:.1f} nm")

3.2 Image Preprocessing

Noise Removal

Code examples1: Comparison of Various Noise Removal Filters

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0
# - scipy>=1.11.0

import numpy as np
import matplotlib.pyplot as plt
import cv2
from skimage import filters, io
from scipy import ndimage

# Fixed random seed (ensuring reproducibility)
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Generate synthetic SEM image (simulating particles)
def generate_synthetic_sem(size=512, num_particles=30):
    """Generate synthetic SEM image"""
    image = np.zeros((size, size), dtype=np.float32)

    # Random particle placement
    for _ in range(num_particles):
        x = np.random.randint(50, size - 50)
        y = np.random.randint(50, size - 50)
        radius = np.random.randint(15, 35)

        # Circular particles
        Y, X = np.ogrid[:size, :size]
        mask = (X - x)**2 + (Y - y)**2 <= radius**2
        image[mask] = 200

    # Add Gaussian noise
    noise = np.random.normal(0, 25, image.shape)
    noisy_image = np.clip(image + noise, 0, 255).astype(np.uint8)

    return noisy_image

# Generate image
noisy_image = generate_synthetic_sem()

# Various noise removal filters
gaussian_blur = cv2.GaussianBlur(noisy_image, (5, 5), 1.0)
median_filter = cv2.medianBlur(noisy_image, 5)
bilateral_filter = cv2.bilateralFilter(noisy_image, 9, 75, 75)
nlm_filter = cv2.fastNlMeansDenoising(noisy_image, None, 10, 7, 21)

# Visualization
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].imshow(noisy_image, cmap='gray')
axes[0, 0].set_title('Noisy SEM Image')
axes[0, 0].axis('off')

axes[0, 1].imshow(gaussian_blur, cmap='gray')
axes[0, 1].set_title('Gaussian Blur')
axes[0, 1].axis('off')

axes[0, 2].imshow(median_filter, cmap='gray')
axes[0, 2].set_title('Median Filter')
axes[0, 2].axis('off')

axes[1, 0].imshow(bilateral_filter, cmap='gray')
axes[1, 0].set_title('Bilateral Filter')
axes[1, 0].axis('off')

axes[1, 1].imshow(nlm_filter, cmap='gray')
axes[1, 1].set_title('Non-Local Means')
axes[1, 1].axis('off')

# Noise Level Comparison
axes[1, 2].bar(['Original', 'Gaussian', 'Median', 'Bilateral', 'NLM'],
               [np.std(noisy_image),
                np.std(gaussian_blur),
                np.std(median_filter),
                np.std(bilateral_filter),
                np.std(nlm_filter)])
axes[1, 2].set_ylabel('Noise Level (std)')
axes[1, 2].set_title('Denoising Performance')
axes[1, 2].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("=== Noise Level (Standard Deviation) ===")
print(f"Original image: {np.std(noisy_image):.2f}")
print(f"Gaussian: {np.std(gaussian_blur):.2f}")
print(f"Median: {np.std(median_filter):.2f}")
print(f"Bilateral: {np.std(bilateral_filter):.2f}")
print(f"NLM: {np.std(nlm_filter):.2f}")

Filter Selection Guidelines: - Gaussian: Fast, smooth edges → General preprocessing - Median: Edge-preserving, robust to salt & pepper noise - Bilateral: Excellent edge preservation, moderate computational cost - Non-Local Means: Highest quality, high computational cost

Contrast Adjustment

Code examples2: Histogram Equalization and CLAHE

from skimage import exposure

# Contrast Adjustment
hist_eq = exposure.equalize_hist(noisy_image)
clahe = exposure.equalize_adapthist(noisy_image, clip_limit=0.03)

# Histogram computation
hist_original = np.histogram(noisy_image, bins=256, range=(0, 256))[0]
hist_eq_vals = np.histogram(
    (hist_eq * 255).astype(np.uint8), bins=256, range=(0, 256))[0]
hist_clahe_vals = np.histogram(
    (clahe * 255).astype(np.uint8), bins=256, range=(0, 256))[0]

# Visualization
fig = plt.figure(figsize=(16, 10))

# Images
ax1 = plt.subplot(2, 3, 1)
ax1.imshow(noisy_image, cmap='gray')
ax1.set_title('Original')
ax1.axis('off')

ax2 = plt.subplot(2, 3, 2)
ax2.imshow(hist_eq, cmap='gray')
ax2.set_title('Histogram Equalization')
ax2.axis('off')

ax3 = plt.subplot(2, 3, 3)
ax3.imshow(clahe, cmap='gray')
ax3.set_title('CLAHE (Adaptive)')
ax3.axis('off')

# Histogram
ax4 = plt.subplot(2, 3, 4)
ax4.hist(noisy_image.ravel(), bins=256, range=(0, 256), alpha=0.7)
ax4.set_xlabel('Pixel Value')
ax4.set_ylabel('Frequency')
ax4.set_title('Original Histogram')
ax4.grid(True, alpha=0.3)

ax5 = plt.subplot(2, 3, 5)
ax5.hist((hist_eq * 255).astype(np.uint8).ravel(),
         bins=256, range=(0, 256), alpha=0.7, color='orange')
ax5.set_xlabel('Pixel Value')
ax5.set_ylabel('Frequency')
ax5.set_title('Histogram Eq. Histogram')
ax5.grid(True, alpha=0.3)

ax6 = plt.subplot(2, 3, 6)
ax6.hist((clahe * 255).astype(np.uint8).ravel(),
         bins=256, range=(0, 256), alpha=0.7, color='green')
ax6.set_xlabel('Pixel Value')
ax6.set_ylabel('Frequency')
ax6.set_title('CLAHE Histogram')
ax6.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("=== Contrast Metrics ===")
print(f"Original image contrast: {noisy_image.max() - noisy_image.min()}")
print(f"Histogram Eq.: {(hist_eq * 255).max() - (hist_eq * 255).min():.1f}")
print(f"CLAHE: {(clahe * 255).max() - (clahe * 255).min():.1f}")

CLAHE(Contrast Limited Adaptive Histogram Equalization)Advantages: - Local contrast enhancement - Suppress excessive enhancement (clip_limit parameter) - Details visible in both dark and bright areas of SEM images


3.3 Particle Detection (Watershed Method)

Binarization and Distance Transform

Code examples3: Otsu method for automaticBinarization

from skimage import morphology, measure
from scipy.ndimage import distance_transform_edt

# Use image after noise removal
denoised = cv2.fastNlMeansDenoising(noisy_image, None, 10, 7, 21)

# Binarization using Otsu method
threshold = filters.threshold_otsu(denoised)
binary = denoised > threshold

# Morphological operations (small noise removal)
binary_cleaned = morphology.remove_small_objects(binary, min_size=50)
binary_cleaned = morphology.remove_small_holes(binary_cleaned, area_threshold=50)

# Distance transform
distance = distance_transform_edt(binary_cleaned)

# Visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 12))

axes[0, 0].imshow(denoised, cmap='gray')
axes[0, 0].set_title('Denoised Image')
axes[0, 0].axis('off')

axes[0, 1].imshow(binary, cmap='gray')
axes[0, 1].set_title(f'Binary (Otsu threshold={threshold:.1f})')
axes[0, 1].axis('off')

axes[1, 0].imshow(binary_cleaned, cmap='gray')
axes[1, 0].set_title('After Morphology')
axes[1, 0].axis('off')

axes[1, 1].imshow(distance, cmap='jet')
axes[1, 1].set_title('Distance Transform')
axes[1, 1].axis('off')
axes[1, 1].colorbar = plt.colorbar(axes[1, 1].imshow(distance, cmap='jet'),
                                   ax=axes[1, 1])

plt.tight_layout()
plt.show()

print(f"=== Binarization Results ===")
print(f"Otsu threshold: {threshold:.1f}")
print(f"White pixel percentage: {binary_cleaned.sum() / binary_cleaned.size * 100:.1f}%")

Particle Separation using Watershed Method

Code examples4: Watershed Segmentation

from skimage.feature import peak_local_max
from skimage.segmentation import watershed

# Detect local maxima (estimate particle centers)
local_max = peak_local_max(
    distance,
    min_distance=20,
    threshold_abs=5,
    labels=binary_cleaned
)

# Create markers
markers = np.zeros_like(distance, dtype=int)
markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)

# Execute Watershed
labels = watershed(-distance, markers, mask=binary_cleaned)

# Visualization
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Display markers
axes[0].imshow(denoised, cmap='gray')
axes[0].plot(local_max[:, 1], local_max[:, 0], 'r+',
             markersize=12, markeredgewidth=2)
axes[0].set_title(f'Detected Centers ({len(local_max)} particles)')
axes[0].axis('off')

# Watershed labels
axes[1].imshow(labels, cmap='nipy_spectral')
axes[1].set_title('Watershed Segmentation')
axes[1].axis('off')

# Overlay contours
overlay = denoised.copy()
overlay_rgb = cv2.cvtColor(overlay, cv2.COLOR_GRAY2RGB)
for region in measure.regionprops(labels):
    minr, minc, maxr, maxc = region.bbox
    cv2.rectangle(overlay_rgb, (minc, minr), (maxc, maxr),
                  (255, 0, 0), 2)

axes[2].imshow(overlay_rgb)
axes[2].set_title('Detected Particles')
axes[2].axis('off')

plt.tight_layout()
plt.show()

print(f"=== Watershed Results ===")
print(f"Number of detected particles: {len(local_max)}")
print(f"Number of labels: {labels.max()}")

3.4 Particle Size Distribution Analysis

Extraction of Particle Features

Code examples5: Calculation of Particle Diameter and Shape Parameters

# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0

"""
Example: Code examples5: Calculation of Particle Diameter and Shape P

Purpose: Demonstrate data visualization techniques
Target: Beginner to Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""

# Feature extraction for each particle
particle_data = []

for region in measure.regionprops(labels):
    # Calculate equivalent circular diameter from area
    area = region.area
    equivalent_diameter = np.sqrt(4 * area / np.pi)

    # Aspect ratio
    major_axis = region.major_axis_length
    minor_axis = region.minor_axis_length
    aspect_ratio = major_axis / (minor_axis + 1e-10)

    # Circularity (4π × area / perimeter²)
    perimeter = region.perimeter
    circularity = 4 * np.pi * area / (perimeter ** 2 + 1e-10)

    particle_data.append({
        'label': region.label,
        'area': area,
        'diameter': equivalent_diameter,
        'aspect_ratio': aspect_ratio,
        'circularity': circularity,
        'centroid': region.centroid
    })

# DataFrameConvert to
import pandas as pd
df_particles = pd.DataFrame(particle_data)

print("=== Particle Feature Statistics ===")
print(df_particles[['diameter', 'aspect_ratio', 'circularity']].describe())

# Particle size distribution plot
fig, axes = plt.subplots(2, 2, figsize=(14, 12))

# Histogram
axes[0, 0].hist(df_particles['diameter'], bins=20, alpha=0.7,
                edgecolor='black')
axes[0, 0].set_xlabel('Diameter (pixels)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Particle Size Distribution')
axes[0, 0].axvline(df_particles['diameter'].mean(), color='red',
                   linestyle='--', label=f'Mean: {df_particles["diameter"].mean():.1f}')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3, axis='y')

# Cumulative distribution
sorted_diameters = np.sort(df_particles['diameter'])
cumulative = np.arange(1, len(sorted_diameters) + 1) / len(sorted_diameters) * 100
axes[0, 1].plot(sorted_diameters, cumulative, linewidth=2)
axes[0, 1].set_xlabel('Diameter (pixels)')
axes[0, 1].set_ylabel('Cumulative Percentage (%)')
axes[0, 1].set_title('Cumulative Size Distribution')
axes[0, 1].axhline(50, color='red', linestyle='--',
                   label=f'D50: {np.median(df_particles["diameter"]):.1f}')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Scatter plot (diameter vs aspect ratio)
axes[1, 0].scatter(df_particles['diameter'],
                   df_particles['aspect_ratio'],
                   alpha=0.6, s=50)
axes[1, 0].set_xlabel('Diameter (pixels)')
axes[1, 0].set_ylabel('Aspect Ratio')
axes[1, 0].set_title('Diameter vs Aspect Ratio')
axes[1, 0].grid(True, alpha=0.3)

# Scatter plot (diameter vs circularity)
axes[1, 1].scatter(df_particles['diameter'],
                   df_particles['circularity'],
                   alpha=0.6, s=50, color='green')
axes[1, 1].set_xlabel('Diameter (pixels)')
axes[1, 1].set_ylabel('Circularity')
axes[1, 1].set_title('Diameter vs Circularity')
axes[1, 1].axhline(0.8, color='red', linestyle='--',
                   label='Spherical threshold')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Particle diameter statistics
print("\n=== Particle Diameter Statistics ===")
print(f"Mean diameter: {df_particles['diameter'].mean():.2f} pixels")
print(f"Median(D50): {df_particles['diameter'].median():.2f} pixels")
print(f"Standard deviation: {df_particles['diameter'].std():.2f} pixels")
print(f"Minimum diameter: {df_particles['diameter'].min():.2f} pixels")
print(f"Maximum diameter: {df_particles['diameter'].max():.2f} pixels")

Particle Size Distribution Fitting (Log-Normal Distribution)

Code examples6: Log-Normal Distribution Fitting

from scipy.stats import lognorm

# Log-normal distribution fitting
diameters = df_particles['diameter'].values
shape, loc, scale = lognorm.fit(diameters, floc=0)

# Fitting results
x = np.linspace(diameters.min(), diameters.max(), 200)
pdf_fitted = lognorm.pdf(x, shape, loc, scale)

# Visualization
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.hist(diameters, bins=20, density=True, alpha=0.6,
         label='Observed', edgecolor='black')
plt.plot(x, pdf_fitted, 'r-', linewidth=2,
         label=f'Log-normal fit (σ={shape:.2f})')
plt.xlabel('Diameter (pixels)')
plt.ylabel('Probability Density')
plt.title('Particle Size Distribution Fitting')
plt.legend()
plt.grid(True, alpha=0.3)

# Q-Q Plot (check goodness of fit)
plt.subplot(1, 2, 2)
theoretical_quantiles = lognorm.ppf(np.linspace(0.01, 0.99, 100),
                                    shape, loc, scale)
observed_quantiles = np.percentile(diameters, np.linspace(1, 99, 100))
plt.scatter(theoretical_quantiles, observed_quantiles, alpha=0.6)
plt.plot([diameters.min(), diameters.max()],
         [diameters.min(), diameters.max()],
         'r--', linewidth=2, label='Perfect fit')
plt.xlabel('Theoretical Quantiles')
plt.ylabel('Observed Quantiles')
plt.title('Q-Q Plot (Log-normal)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("=== Log-Normal Distribution Parameters ===")
print(f"Shape (σ): {shape:.3f}")
print(f"Scale (median): {scale:.2f} pixels")

3.5 Image Classification using Deep Learning

Material Image Classification using Transfer Learning (VGG16)

Code examples7: Material Phase Classification using CNN

# Requirements:
# - Python 3.9+
# - tensorflow>=2.13.0, <2.16.0

"""
Example: Code examples7: Material Phase Classification using CNN

Purpose: Demonstrate data visualization techniques
Target: Advanced
Execution time: 30-60 seconds
Dependencies: None
"""

# TensorFlow/KerasImport
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.applications import VGG16
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    TENSORFLOW_AVAILABLE = True

    # TensorFlowRecord versions (reproducibility)
    print(f"TensorFlow version: {tf.__version__}")
    print(f"Keras version: {keras.__version__}")

    # Fix random seed
    RANDOM_SEED = 42
    tf.random.set_seed(RANDOM_SEED)
    np.random.seed(RANDOM_SEED)
except ImportError:
    TENSORFLOW_AVAILABLE = False
    print("TensorFlow not available. Skipping this example.")

if TENSORFLOW_AVAILABLE:
    # Generate sample images (3-class classification)
    def generate_material_images(num_samples=100, img_size=128):
        """
        Generate material image samples
        Class 0: Spherical particles
        Class 1: Rod-shaped particles
        Class 2: Irregular particles
        """
        images = []
        labels = []

        for class_id in range(3):
            for _ in range(num_samples):
                img = np.zeros((img_size, img_size), dtype=np.uint8)

                if class_id == 0:  # Spherical
                    num_particles = np.random.randint(5, 15)
                    for _ in range(num_particles):
                        x = np.random.randint(20, img_size - 20)
                        y = np.random.randint(20, img_size - 20)
                        r = np.random.randint(8, 15)
                        cv2.circle(img, (x, y), r, 200, -1)

                elif class_id == 1:  # Rod-shaped
                    num_rods = np.random.randint(3, 8)
                    for _ in range(num_rods):
                        x1 = np.random.randint(10, img_size - 10)
                        y1 = np.random.randint(10, img_size - 10)
                        length = np.random.randint(30, 60)
                        angle = np.random.rand() * 2 * np.pi
                        x2 = int(x1 + length * np.cos(angle))
                        y2 = int(y1 + length * np.sin(angle))
                        cv2.line(img, (x1, y1), (x2, y2), 200, 3)

                else:  # Irregular
                    num_shapes = np.random.randint(5, 12)
                    for _ in range(num_shapes):
                        pts = np.random.randint(10, img_size - 10,
                                                size=(6, 2))
                        cv2.fillPoly(img, [pts], 200)

                # Add noise
                noise = np.random.normal(0, 20, img.shape)
                img = np.clip(img + noise, 0, 255).astype(np.uint8)

                # Convert to RGB
                img_rgb = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
                images.append(img_rgb)
                labels.append(class_id)

        return np.array(images), np.array(labels)

    # Generate data
    X_data, y_data = generate_material_images(num_samples=150)

    # Split training and test data
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(
        X_data, y_data, test_size=0.2, random_state=RANDOM_SEED
    )

    # Data normalization
    X_train = X_train.astype('float32') / 255.0
    X_test = X_test.astype('float32') / 255.0

    # VGG16Load model (ImageNet weights)
    base_model = VGG16(
        weights='imagenet',
        include_top=False,
        input_shape=(128, 128, 3)
    )

    # Freeze base model weights
    base_model.trainable = False

    # Add new classification layers
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(128, activation='relu')(x)
    predictions = Dense(3, activation='softmax')(x)

    model = Model(inputs=base_model.input, outputs=predictions)

    # Compile model
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    # Train model
    history = model.fit(
        X_train, y_train,
        validation_split=0.2,
        epochs=10,
        batch_size=16,
        verbose=0
    )

    # Evaluate on test data
    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)

    print("=== CNN Classification Results ===")
    print(f"Test accuracy: {test_acc * 100:.2f}%")

    # Plot learning curves
    plt.figure(figsize=(14, 5))

    plt.subplot(1, 2, 1)
    plt.plot(history.history['loss'], label='Training Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Training and Validation Loss')
    plt.legend()
    plt.grid(True, alpha=0.3)

    plt.subplot(1, 2, 2)
    plt.plot(history.history['accuracy'], label='Training Accuracy')
    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title('Training and Validation Accuracy')
    plt.legend()
    plt.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    # Confusion Matrix
    from sklearn.metrics import confusion_matrix, classification_report
    y_pred = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred, axis=1)

    cm = confusion_matrix(y_test, y_pred_classes)

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap='Blues')
    plt.title('Confusion Matrix')
    plt.colorbar()
    tick_marks = np.arange(3)
    plt.xticks(tick_marks, ['Spherical', 'Rod', 'Irregular'])
    plt.yticks(tick_marks, ['Spherical', 'Rod', 'Irregular'])

    # Display numerical values
    for i in range(3):
        for j in range(3):
            plt.text(j, i, cm[i, j], ha='center', va='center',
                    color='white' if cm[i, j] > cm.max() / 2 else 'black')

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()
    plt.show()

    print("\n=== Classification Report ===")
    print(classification_report(
        y_test, y_pred_classes,
        target_names=['Spherical', 'Rod', 'Irregular']
    ))

3.6 Integrated Image Analysis Pipeline

Building an Automated Analysis System

Code examples8: Image Analysis Pipeline Class

from dataclasses import dataclass
from typing import List, Dict
import json

@dataclass
class ParticleAnalysisResult:
    """particlesAnalysis Results"""
    num_particles: int
    mean_diameter: float
    std_diameter: float
    mean_circularity: float
    particle_data: List[Dict]

class SEMImageAnalyzer:
    """SEM Image Automated Analysis System"""

    def __init__(self, img_size=(512, 512)):
        self.img_size = img_size
        self.image = None
        self.binary = None
        self.labels = None
        self.particles = []

    def load_image(self, image: np.ndarray):
        """Load image"""
        if len(image.shape) == 3:
            self.image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
        else:
            self.image = image

        # Resize
        self.image = cv2.resize(self.image, self.img_size)

    def preprocess(self, denoise_strength=10):
        """Preprocessing(Noise Removal・Contrast Adjustment)"""
        # Noise Removal
        denoised = cv2.fastNlMeansDenoising(
            self.image, None, denoise_strength, 7, 21
        )

        # CLAHE
        clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
        enhanced = clahe.apply(denoised)

        self.image = enhanced

    def segment_particles(self, min_size=50):
        """particlesSegmentation"""
        # OtsuBinarization
        threshold = filters.threshold_otsu(self.image)
        binary = self.image > threshold

        # Morphology
        binary_cleaned = morphology.remove_small_objects(
            binary, min_size=min_size
        )
        binary_cleaned = morphology.remove_small_holes(
            binary_cleaned, area_threshold=min_size
        )

        # Distance transform
        distance = distance_transform_edt(binary_cleaned)

        # Watershed
        local_max = peak_local_max(
            distance,
            min_distance=20,
            threshold_abs=5,
            labels=binary_cleaned
        )

        markers = np.zeros_like(distance, dtype=int)
        markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)

        self.labels = watershed(-distance, markers, mask=binary_cleaned)
        self.binary = binary_cleaned

    def extract_features(self):
        """Feature extraction"""
        self.particles = []

        for region in measure.regionprops(self.labels):
            area = region.area
            diameter = np.sqrt(4 * area / np.pi)
            aspect_ratio = region.major_axis_length / \
                          (region.minor_axis_length + 1e-10)
            circularity = 4 * np.pi * area / \
                         (region.perimeter ** 2 + 1e-10)

            self.particles.append({
                'label': region.label,
                'area': area,
                'diameter': diameter,
                'aspect_ratio': aspect_ratio,
                'circularity': circularity,
                'centroid': region.centroid
            })

    def get_results(self) -> ParticleAnalysisResult:
        """Get results"""
        df = pd.DataFrame(self.particles)

        return ParticleAnalysisResult(
            num_particles=len(self.particles),
            mean_diameter=df['diameter'].mean(),
            std_diameter=df['diameter'].std(),
            mean_circularity=df['circularity'].mean(),
            particle_data=self.particles
        )

    def visualize(self):
        """Visualize results"""
        fig, axes = plt.subplots(2, 2, figsize=(14, 14))

        # Original image
        axes[0, 0].imshow(self.image, cmap='gray')
        axes[0, 0].set_title('Preprocessed Image')
        axes[0, 0].axis('off')

        # Binarization
        axes[0, 1].imshow(self.binary, cmap='gray')
        axes[0, 1].set_title('Binary Segmentation')
        axes[0, 1].axis('off')

        # Watershed labels
        axes[1, 0].imshow(self.labels, cmap='nipy_spectral')
        axes[1, 0].set_title(f'Particles ({len(self.particles)})')
        axes[1, 0].axis('off')

        # Particle size distribution
        df = pd.DataFrame(self.particles)
        axes[1, 1].hist(df['diameter'], bins=20, alpha=0.7,
                       edgecolor='black')
        axes[1, 1].set_xlabel('Diameter (pixels)')
        axes[1, 1].set_ylabel('Frequency')
        axes[1, 1].set_title('Particle Size Distribution')
        axes[1, 1].grid(True, alpha=0.3, axis='y')

        plt.tight_layout()
        plt.show()

    def save_results(self, filename='analysis_results.json'):
        """Save results to JSON"""
        results = self.get_results()
        output = {
            'num_particles': results.num_particles,
            'mean_diameter': results.mean_diameter,
            'std_diameter': results.std_diameter,
            'mean_circularity': results.mean_circularity,
            'particles': results.particle_data
        }

        with open(filename, 'w') as f:
            json.dump(output, f, indent=2)

        print(f"Results saved to {filename}")

# Usage Example
analyzer = SEMImageAnalyzer()
analyzer.load_image(noisy_image)
analyzer.preprocess(denoise_strength=10)
analyzer.segment_particles(min_size=50)
analyzer.extract_features()

results = analyzer.get_results()
print("=== Analysis Results ===")
print(f"Detected particles: {results.num_particles}")
print(f"Mean diameter: {results.mean_diameter:.2f} ± {results.std_diameter:.2f} pixels")
print(f"Mean circularity: {results.mean_circularity:.3f}")

analyzer.visualize()
analyzer.save_results('sem_analysis.json')

3.7 Practical Pitfalls and Solutions

Common Failure Examples and Best Practices

Failure 1: Excessive Segmentation (Over-segmentation)

Symptom: Single particle is divided into multiple parts

Cause: Watershed min_distance is too small, or residual noise remains

Solution:

# ❌ Bad Example: Excessive segmentation
local_max = peak_local_max(distance, min_distance=5)  # Too small
# Result: Noise or minor irregularities within particles are recognized as separate particles

# ✅ Good Example: Appropriate min_distance setting
# Set based on typical particle diameter
expected_diameter_pixels = 30
min_distance = int(expected_diameter_pixels * 0.6)  # About 60% of diameter

local_max = peak_local_max(
    distance,
    min_distance=min_distance,
    threshold_abs=5,  # Threshold for distance transform values
    labels=binary_cleaned
)
print(f"min_distance: {min_distance} pixels")
print(f"Number of detected particles: {len(local_max)}")

Pitfall 2: Incorrect Threshold Selection

Symptom: Part of particles is missing, or background is detected as particles

Cause: Otsu method is inappropriate (non-bimodal histogram)

Solution:

# ❌ Bad Example: Apply Otsu unconditionally to all images
threshold = filters.threshold_otsu(image)
binary = image > threshold

# ✅ Good Example: Histogram verification and fallback
threshold_otsu = filters.threshold_otsu(image)

# Check histogram shape
hist, bin_edges = np.histogram(image, bins=256, range=(0, 256))

# Simple check for bimodality (two peaks)
from scipy.signal import find_peaks
peaks, _ = find_peaks(hist, prominence=100)

if len(peaks) >= 2:
    # Bimodal → Otsu applicable
    threshold = threshold_otsu
    print(f"Otsu threshold: {threshold:.1f} (bimodal histogram)")
else:
    # Unimodal → Alternative method (Mean + Standard deviation, etc.)
    threshold = image.mean() + image.std()
    print(f"Mean+Std threshold: {threshold:.1f} (unimodal histogram)")
    print("Warning: Histogram is unimodal. Otsu method may be inappropriate")

binary = image > threshold

Pitfall 3: Lack of Pixel Calibration

Symptom: Physical size (nm, μm) is unknown, data at different magnifications cannot be compared

Cause: Scale bar information not utilized

Solution:

# ❌ Bad Example: Report in pixel units
print(f"Mean particle diameter: {mean_diameter:.1f} pixels")  # Physical size unknown

# ✅ Good Example: Scale bar calibration
# Step 1: Measure scale bar length (manual or OCR)
SCALE_BAR_NM = 500  # Physical size of scale bar (nm)
SCALE_BAR_PIXELS = 200  # Number of pixels in scale bar

# Step 2: Calculate calibration coefficient
nm_per_pixel = SCALE_BAR_NM / SCALE_BAR_PIXELS
print(f"Calibration: {nm_per_pixel:.2f} nm/pixel")

# Step 3: Convert to physical size
mean_diameter_nm = mean_diameter * nm_per_pixel
std_diameter_nm = std_diameter * nm_per_pixel

print(f"Mean particle diameter: {mean_diameter_nm:.1f} ± {std_diameter_nm:.1f} nm")

# Step 4: Save calibration information to results
calibration_info = {
    'scale_bar_nm': SCALE_BAR_NM,
    'scale_bar_pixels': SCALE_BAR_PIXELS,
    'nm_per_pixel': nm_per_pixel,
    'calibration_date': '2025-10-19'
}

Pitfall 4: CNN Overfitting (Biased Training Data)

Symptom: High training accuracy but poor performance on test data

Cause: Dataset bias, insufficient data augmentation

Solution:

# ❌ Bad Example: No data augmentation, small dataset
model.fit(X_train, y_train, epochs=50, batch_size=16)
# Result: Overfitting to training data

# ✅ Good Example: Data augmentation and Early Stopping
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping

# Data augmentation settings
datagen = ImageDataGenerator(
    rotation_range=20,  # Random rotation
    width_shift_range=0.1,  # Horizontal shift
    height_shift_range=0.1,  # Vertical shift
    horizontal_flip=True,  # Horizontal flip
    zoom_range=0.1,  # Zoom
    fill_mode='nearest'
)

# Early Stopping (Stop training when validation loss does not improve)
early_stop = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

# Training
history = model.fit(
    datagen.flow(X_train, y_train, batch_size=16),
    validation_data=(X_test, y_test),
    epochs=50,
    callbacks=[early_stop],
    verbose=1
)

print(f"Training stopped at epoch: {len(history.history['loss'])}")

Pitfall 5: Batch Effect in Batch Processing

Symptom: Even for samples under the same conditions, particle diameter differs systematically by measurement date

Cause: Instrument drift over time, calibration deviation

Solution:

# ❌ Bad Example: No correction between batches
results = []
for image_file in image_files:
    analyzer.load_image(image_file)
    analyzer.preprocess()
    results.append(analyzer.get_results())

# ✅ Good Example: Correction using reference material
from datetime import datetime

# Measurement of reference material (known particle diameter)
STANDARD_DIAMETER_NM = 100.0  # Known physical size

def calibrate_with_standard(standard_image, expected_diameter_nm):
    """Get calibration coefficient from reference material"""
    analyzer_std = SEMImageAnalyzer()
    analyzer_std.load_image(standard_image)
    analyzer_std.preprocess()
    analyzer_std.segment_particles()
    analyzer_std.extract_features()

    results_std = analyzer_std.get_results()
    measured_diameter_pixels = results_std.mean_diameter

    nm_per_pixel = expected_diameter_nm / measured_diameter_pixels

    calibration = {
        'date': datetime.now().isoformat(),
        'nm_per_pixel': nm_per_pixel,
        'standard_diameter_nm': expected_diameter_nm,
        'measured_pixels': measured_diameter_pixels
    }

    return calibration

# Measure standard sample at the beginning of each batch
batch_calibrations = {}
for batch_id, batch_images in batches.items():
    # Standard sample measurement
    standard_image = load_standard_image(batch_id)
    calib = calibrate_with_standard(standard_image, STANDARD_DIAMETER_NM)
    batch_calibrations[batch_id] = calib

    print(f"Batch {batch_id}: {calib['nm_per_pixel']:.3f} nm/pixel")

    # Analysis of samples in batch (apply calibration coefficient)
    for image_file in batch_images:
        analyzer.load_image(image_file)
        analyzer.preprocess()
        analyzer.segment_particles()
        analyzer.extract_features()

        results = analyzer.get_results()

        # Convert to physical size using calibration coefficient
        diameter_nm = results.mean_diameter * calib['nm_per_pixel']
        print(f"  Sample: {diameter_nm:.1f} nm")

3.8 Image Analysis Skills Checklist

Image Preprocessing Skills

Foundational Level

Applied Level

Advanced Level

Particle Detection and Segmentation Skills

Foundational Level

Applied Level

Advanced Level

Image Measurement and Calibration Skills

Foundational Level

Applied Level

Advanced Level

Deep Learning and Image Classification Skills

Foundational Level

Applied Level

Advanced Level

Integrated Skills: Overall Workflow

Foundational Level

Applied Level

Advanced Level


3.9 Comprehensive Skills Assessment

Please conduct self-assessment based on the following criteria.

Level 1: Beginner (60%+ of foundational skills)

Next Steps: - Master Applied Level techniques (Watershed, data augmentation) - Challenge more complex real data - Deepen understanding of parameter meanings

Level 2: Intermediate (100% foundational + 60%+ applied)

Next Steps: - Challenge advanced techniques (custom CNN, 3D analysis) - Practice in research projects - Optimize code and improve scalability

Level 3: Advanced (100% foundational + 100% applied + 60%+ advanced)

Next Steps: - Develop original methods - Paper writing and conference presentations - Contribute to open source

Level 4: Expert (90%+ of all items)

Activity Examples: - Invited talks and tutorials - Leading collaborative research - Participating in standardization activities


3.10 Action Plan Template

Current Level: _____

Target Level (in 3 months): _____

Priority Skills to Strengthen (select 3):




Specific Action Plan:

Week 1-2: - [ ] Action 1: ____ - [ ] Action 2: ______

Week 3-4: - [ ] Action 1: ____ - [ ] Action 2: ______

Week 5-8: - [ ] Action 1: ____ - [ ] Action 2: ______

Week 9-12: - [ ] Action 1: ____ - [ ] Action 2: ______

Evaluation Metrics:


3.11 Chapter Summary

What We Learned

  1. Data Licensing and Reproducibility - Utilization of image data repositories - Recording image metadata and calibration information - Best Practices for Code Reproducibility

  2. Image Preprocessing - Noise Removal (Gaussian, Median, Bilateral, NLM) - Contrast Adjustment (Histogram Equalization, CLAHE) - Binarization (Otsu method)

  3. Particle Detection - Segmentation using Watershed method - Distance transform and local maxima detection - Morphological operations

  4. Quantitative Analysis - Particle size distribution (histogram, cumulative distribution) - Shape parameters (circularity, aspect ratio) - Log-normal distribution fitting

  5. Deep Learning - Transfer learning (VGG16) - Material image classification - Performance evaluation using confusion matrix

  6. Practical Pitfalls - Avoiding excessive segmentation - Importance of pixel calibration - Correction of batch effects - Solutions for CNN overfitting

Key Points

Next Chapter

In Chapter 4, we will learn about time-series data and integrated analysis: - Preprocessing of temperature and pressure sensor data - Moving window analysis - Anomaly detection - Dimensionality reduction using PCA - Automation using sklearn Pipeline

Chapter 4: Time-Series Data and Integrated Analysis →


Exercises

Problem 1 (Difficulty: easy)

Determine whether the following statements are true or false.

  1. Bilateral Filter can preserve edges better than Gaussian Filter
  2. CLAHE applies the same histogram equalization to the entire image
  3. In the Watershed method, local maxima of the distance transform are considered as particle centers
Hint 1. Operating principle of Bilateral Filter (considers both spatial distance and intensity difference) 2. CLAHE "Adaptive" meaning of 3. Watershed algorithm flow (Distance transform → markers → watershed)
Solution Example **Answer**: 1. **True** - Bilateral Filter considers intensity differences, so smoothing is suppressed near edges 2. **False** - CLAHE divides the image into small regions (tiles) and performs adaptive histogram equalization for each region 3. **True** - Local maxima of the distance transform correspond to particle centers and are used as markers for Watershed **Explanation**: In image preprocessing, it is important to select appropriate methods according to the processing purpose (noise removal vs edge preservation). Watershed is a powerful method that can separate even touching particles by utilizing distance transform.

Problem 2 (Difficulty: medium)

Perform particle detection and particle size distribution analysis on the following SEM image data.

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: Perform particle detection and particle size distribution an

Purpose: Demonstrate core concepts and implementation patterns
Target: Beginner to Intermediate
Execution time: ~5 seconds
Dependencies: None
"""

import numpy as np

# Generate sample SEM image
def generate_sample_sem():
    np.random.seed(100)
    img = np.zeros((512, 512), dtype=np.uint8)

    for _ in range(40):
        x = np.random.randint(30, 482)
        y = np.random.randint(30, 482)
        r = np.random.randint(10, 25)
        cv2.circle(img, (x, y), r, 200, -1)

    noise = np.random.normal(0, 30, img.shape)
    return np.clip(img + noise, 0, 255).astype(np.uint8)

sample_image = generate_sample_sem()

Requirements: 1. Noise removal using Non-Local Means 2. Binarization using Otsu method 3. Particle detection using Watershed method 4. Plot particle size distribution as histogram 5. Output mean particle diameter and standard deviation

Hint **Processing Flow**: 1. Noise removal using `cv2.fastNlMeansDenoising` 2. Calculate threshold using `filters.threshold_otsu` → Binarization 3. `distance_transform_edt` + `peak_local_max` + `watershed` 4. Feature extraction for particles using `measure.regionprops` 5. Visualization using `matplotlib.pyplot.hist`
Solution Example
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0

"""
Example: Requirements:
1. Noise removal using Non-Local Means
2. Bina

Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""

import numpy as np
import cv2
import matplotlib.pyplot as plt
from skimage import filters, morphology, measure
from skimage.feature import peak_local_max
from skimage.segmentation import watershed
from scipy.ndimage import distance_transform_edt

# Generate sample image
def generate_sample_sem():
    np.random.seed(100)
    img = np.zeros((512, 512), dtype=np.uint8)

    for _ in range(40):
        x = np.random.randint(30, 482)
        y = np.random.randint(30, 482)
        r = np.random.randint(10, 25)
        cv2.circle(img, (x, y), r, 200, -1)

    noise = np.random.normal(0, 30, img.shape)
    return np.clip(img + noise, 0, 255).astype(np.uint8)

sample_image = generate_sample_sem()

# Step1: Noise Removal
denoised = cv2.fastNlMeansDenoising(sample_image, None, 10, 7, 21)

# Step2: OtsuBinarization
threshold = filters.threshold_otsu(denoised)
binary = denoised > threshold
binary = morphology.remove_small_objects(binary, min_size=30)

# Step 3: Watershed
distance = distance_transform_edt(binary)
local_max = peak_local_max(distance, min_distance=15,
                           threshold_abs=3, labels=binary)
markers = np.zeros_like(distance, dtype=int)
markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)
labels = watershed(-distance, markers, mask=binary)

# Step4: Calculate particle diameter
diameters = []
for region in measure.regionprops(labels):
    area = region.area
    diameter = np.sqrt(4 * area / np.pi)
    diameters.append(diameter)

diameters = np.array(diameters)

# Step5: Statistics and visualization
print("=== Particle Diameter Statistics ===")
print(f"Detected particles: {len(diameters)}")
print(f"MeanParticle diameter: {diameters.mean():.2f} pixels")
print(f"Standard deviation: {diameters.std():.2f} pixels")

fig, axes = plt.subplots(2, 2, figsize=(14, 14))

axes[0, 0].imshow(sample_image, cmap='gray')
axes[0, 0].set_title('Original Image')
axes[0, 0].axis('off')

axes[0, 1].imshow(binary, cmap='gray')
axes[0, 1].set_title(f'Binary (Otsu={threshold:.1f})')
axes[0, 1].axis('off')

axes[1, 0].imshow(labels, cmap='nipy_spectral')
axes[1, 0].set_title(f'Detected Particles ({len(diameters)})')
axes[1, 0].axis('off')

axes[1, 1].hist(diameters, bins=15, alpha=0.7, edgecolor='black')
axes[1, 1].axvline(diameters.mean(), color='red', linestyle='--',
                  label=f'Mean: {diameters.mean():.1f}')
axes[1, 1].set_xlabel('Diameter (pixels)')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Particle Size Distribution')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()
**Output Example**:
=== Particle Diameter Statistics ===
Detected particles: 38
MeanParticle diameter: 30.45 pixels
Standard deviation: 8.23 pixels
**Explanation**: In this example, setting min_distance=15 for Watershed suppresses over-detection of adjacent particles. The variability in particle diameter (standard deviation) is due to the synthesis process. In real data, it reflects measurement conditions and material inhomogeneity.

Problem 3 (Difficulty: hard)

Build a system to automatically process multiple SEM images and perform statistical comparison of particle size distributions.

Background: For material samples A, B, and C prepared under different synthesis conditions, 10 SEM images were taken for each. It is necessary to automatically analyze the particle size distribution of each sample and statistically compare them.

Task: 1. Automatically analyze 30 images through batch processing 2. Visualize particle size distribution for each sample 3. Statistical significance testing using analysis of variance (ANOVA) 4. Output results as PDF report

Constraints: - Measurement conditions (magnification, exposure) may differ for each image - Some images have poor contrast - Processing time: within 5 seconds/image

Hint **Design Guidelines**: 1. Extend `SEMImageAnalyzer` class 2. Adaptive preprocessing (contrast determination via histogram analysis) 3. Save results in structured format (JSON/CSV) 4. `scipy.stats.f_oneway`for ANOVA 5. `matplotlib.backends.backend_pdf`for PDF generation
Solution Example **Solution Overview**: Build an integrated system including batch processing, statistical analysis, and report generation. **Implementation Code**:
from scipy.stats import f_oneway
from matplotlib.backends.backend_pdf import PdfPages

class BatchSEMAnalyzer:
    """Batch SEM image analysis system"""

    def __init__(self):
        self.results = {}

    def adaptive_preprocess(self, image):
        """Adaptive preprocessing"""
        # Contrast evaluation
        contrast = image.max() - image.min()

        if contrast < 100:  # Low contrast
            # CLAHEEnhancement
            clahe = cv2.createCLAHE(clipLimit=3.0,
                                    tileGridSize=(8, 8))
            image = clahe.apply(image)

        # Noise removal (adaptive strength)
        noise_std = np.std(np.diff(image, axis=0))
        h = 10 if noise_std < 20 else 15

        denoised = cv2.fastNlMeansDenoising(image, None, h, 7, 21)
        return denoised

    def analyze_single(self, image, sample_id):
        """Single image analysis"""
        # Preprocessing
        preprocessed = self.adaptive_preprocess(image)

        # OtsuBinarization
        threshold = filters.threshold_otsu(preprocessed)
        binary = preprocessed > threshold
        binary = morphology.remove_small_objects(binary, min_size=30)

        # Watershed
        distance = distance_transform_edt(binary)
        local_max = peak_local_max(distance, min_distance=15,
                                   labels=binary)
        markers = np.zeros_like(distance, dtype=int)
        markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)
        labels = watershed(-distance, markers, mask=binary)

        # Extract particle diameter
        diameters = []
        for region in measure.regionprops(labels):
            area = region.area
            diameter = np.sqrt(4 * area / np.pi)
            diameters.append(diameter)

        return np.array(diameters)

    def batch_analyze(self, image_dict):
        """
        Batch analysis

        Parameters:
        -----------
        image_dict : dict
            {'sample_A': [img1, img2, ...], 'sample_B': [...]}
        """
        for sample_id, images in image_dict.items():
            all_diameters = []

            for img in images:
                diameters = self.analyze_single(img, sample_id)
                all_diameters.extend(diameters)

            self.results[sample_id] = np.array(all_diameters)

    def statistical_comparison(self):
        """Statistical comparison (ANOVA)"""
        groups = list(self.results.values())
        f_stat, p_value = f_oneway(*groups)

        print("=== Analysis of Variance (ANOVA) ===")
        print(f"F-statistic: {f_stat:.3f}")
        print(f"p-value: {p_value:.4f}")

        if p_value < 0.05:
            print("Conclusion: Significant difference between samples (p < 0.05)")
        else:
            print("Conclusion: No significant difference between samples (p ≥ 0.05)")

        return f_stat, p_value

    def generate_report(self, filename='sem_report.pdf'):
        """Generate PDF report"""
        with PdfPages(filename) as pdf:
            # Page 1: Particle size distribution comparison
            fig, axes = plt.subplots(2, 2, figsize=(11, 8.5))

            for i, (sample_id, diameters) in enumerate(self.results.items()):
                ax = axes.ravel()[i]
                ax.hist(diameters, bins=20, alpha=0.7, edgecolor='black')
                ax.axvline(diameters.mean(), color='red',
                          linestyle='--',
                          label=f'Mean: {diameters.mean():.1f}')
                ax.set_xlabel('Diameter (pixels)')
                ax.set_ylabel('Frequency')
                ax.set_title(f'{sample_id} (n={len(diameters)})')
                ax.legend()
                ax.grid(True, alpha=0.3, axis='y')

            # Statistical Summary
            ax = axes.ravel()[3]
            ax.axis('off')
            summary_text = "=== Statistical Summary ===\n\n"
            for sample_id, diameters in self.results.items():
                summary_text += f"{sample_id}:\n"
                summary_text += f"  Mean: {diameters.mean():.2f}\n"
                summary_text += f"  Std: {diameters.std():.2f}\n"
                summary_text += f"  n: {len(diameters)}\n\n"

            ax.text(0.1, 0.5, summary_text, fontsize=12,
                   verticalalignment='center', family='monospace')

            plt.tight_layout()
            pdf.savefig(fig)
            plt.close()

            # Page 2: Box plot comparison
            fig, ax = plt.subplots(figsize=(11, 8.5))
            data = [self.results[key] for key in self.results.keys()]
            ax.boxplot(data, labels=list(self.results.keys()))
            ax.set_ylabel('Diameter (pixels)')
            ax.set_title('Particle Size Distribution Comparison')
            ax.grid(True, alpha=0.3, axis='y')

            pdf.savefig(fig)
            plt.close()

        print(f"Report saved to {filename}")

# Demo execution
if __name__ == "__main__":
    # Generate sample data
    np.random.seed(42)

    image_dict = {}
    for sample_id, mean_size in [('Sample_A', 25),
                                   ('Sample_B', 35),
                                   ('Sample_C', 30)]:
        images = []
        for _ in range(10):
            img = np.zeros((512, 512), dtype=np.uint8)
            num_particles = np.random.randint(30, 50)

            for _ in range(num_particles):
                x = np.random.randint(30, 482)
                y = np.random.randint(30, 482)
                r = int(np.random.normal(mean_size, 5))
                r = max(10, min(40, r))
                cv2.circle(img, (x, y), r, 200, -1)

            noise = np.random.normal(0, 25, img.shape)
            img = np.clip(img + noise, 0, 255).astype(np.uint8)
            images.append(img)

        image_dict[sample_id] = images

    # Batch analysis
    analyzer = BatchSEMAnalyzer()
    analyzer.batch_analyze(image_dict)

    # Statistical comparison
    analyzer.statistical_comparison()

    # Generate report
    analyzer.generate_report('sem_comparison_report.pdf')

    print("\n=== Statistics by Sample ===")
    for sample_id, diameters in analyzer.results.items():
        print(f"{sample_id}:")
        print(f"  Number of particles: {len(diameters)}")
        print(f"  Mean: {diameters.mean():.2f} ± {diameters.std():.2f}")
**Result Example**:
=== Analysis of Variance (ANOVA) ===
F-statistic: 124.567
p-value: 0.0001
Conclusion: Significant difference between samples (p < 0.05)

=== Statistics by Sample ===
Sample_A:
  Number of particles: 423
  Mean: 25.12 ± 4.89
Sample_B:
  Number of particles: 398
  Mean: 35.34 ± 5.23
Sample_C:
  Number of particles: 415
  Mean: 30.05 ± 4.76

Report saved to sem_comparison_report.pdf
**Detailed Explanation**: 1. **Adaptive preprocessing**: Evaluate contrast and noise level of each image and automatically adjust parameters 2. **Statistical testing**: Quantitatively evaluate particle diameter differences among 3 groups using ANOVA 3. **PDF output**: Automatically generate multi-page report (directly usable for papers and reports) **Additional Considerations**: - Multiple comparison using Tukey HSD test (which pairs have significant differences) - Shape comparison of particle size distribution (skewness, kurtosis) - Image quality evaluation using machine learning (automatic exclusion of poor images)

References

  1. Bradski, G., & Kaehler, A. (2008). "Learning OpenCV: Computer Vision with the OpenCV Library." O'Reilly Media. ISBN: 978-0596516130

  2. van der Walt, S. et al. (2014). "scikit-image: image processing in Python." PeerJ, 2, e453. DOI: 10.7717/peerj.453

  3. Beucher, S., & Meyer, F. (1993). "The morphological approach to segmentation: the watershed transformation." Mathematical Morphology in Image Processing, 433-481.

  4. Simonyan, K., & Zisserman, A. (2015). "Very Deep Convolutional Networks for Large-Scale Image Recognition." ICLR 2015. arXiv: 1409.1556

  5. OpenCV Documentation: Image Processing. URL: https://docs.opencv.org/4.x/d2/d96/tutorial_py_table_of_contents_imgproc.html

  6. EMPIAR (Electron Microscopy Public Image Archive). URL: https://www.ebi.ac.uk/empiar/

  7. NanoMine Database. URL: https://materialsmine.org/


Navigation

← Previous Chapter

Chapter 2: Spectral Data Analysis ←

Next Chapter

Chapter 4: Time-Series Data and Integrated Analysis →

Series Index

← Return to Series Index


Author Information

Author: AI Terakoya Content Team Created: 2025-10-17 Updated: 2025-10-19 Version: 1.1

Update History: - 2025-10-19: v1.1 Added data licensing, practical pitfalls, and skills checklist - 2025-10-17: v1.0 Initial release

Feedback: - GitHub Issues: [Repository URL]/issues - Email: yusuke.hashimoto.b8@tohoku.ac.jp

License: Creative Commons BY 4.0


Continue learning in the next chapter!

Disclaimer