Chapter 3: Image Data Analysis

Implement a basic pipeline for extracting particle information from SEM/TEM images. Tips for robustness against noise and overlapping are also introduced.

💡 Note: The workflow is fixed in the order: preprocessing (smoothing, binarization) → feature extraction → post-processing. The quality of annotations determines performance.

Automated Analysis of SEM/TEM Images - From Particle Detection to Deep Learning

Learning Objectives

By completing this chapter, you will be able to:

✅ Perform preprocessing of SEM/TEM images (noise removal, contrast adjustment)
✅ Implement particle detection using the Watershed method
✅ Quantify particle size distribution and shape parameters (circularity, aspect ratio)
✅ Perform material image classification using CNN (transfer learning)
✅ Build an image analysis pipeline using OpenCV and scikit-image

Reading time: 30-35minutes Code examples: 13 Exercises: 3

3.1 Characteristics of Image Data and Preprocessing Strategy

Characteristics of SEM/TEM Images

Electron microscopy images are powerful tools for visualizing nano- to micro-scale structures of materials.

Measurement Technique	Spatial Resolution	Typical Field of View	Main Information	Image Characteristics
SEM	Several nm to several μm	10μm to 1mm	Surface morphology, microstructure	Deep depth of field, shadow effects
TEM	Atomic level	Tens of nm to several μm	Internal structure, crystallinity	High contrast, diffraction patterns
STEM	Sub-nm	Tens to hundreds of nm	Atomic arrangement, elemental distribution	Atomic resolution

Typical Workflow for Image Analysis

flowchart TD A[Image Acquisition] --> B[Preprocessing] B --> C[Segmentation] C --> D[Feature Extraction] D --> E[Quantitative Analysis] E --> F[Statistical Processing] F --> G[Visualization & Reporting] B --> B1[Noise Removal] B --> B2[Contrast Adjustment] B --> B3[Binarization] style A fill:#e3f2fd style B fill:#fff3e0 style C fill:#f3e5f5 style D fill:#e8f5e9 style E fill:#fce4ec style G fill:#fff9c4

3.2 Data Licensing and Reproducibility

Image Data Repositories and Licensing

Utilization of microscopy image databases is essential for algorithm development and validation.

Major Image Databases

Database	Content	License	Access	Citation Requirements
EMPIAR (Electron Microscopy)	TEM/cryo-EM images	CC BY 4.0	Free	Required
NanoMine	Nanocomposite SEM	CC BY 4.0	Free	Recommended
Materials Data Facility	Materials image datasets	Mixed	Free	Required
Kaggle Datasets (Microscopy)	Various microscopy images	Mixed	Free	Verification required
NIST SRD	Standard reference images	Public Domain	Free	Recommended

Notes on Data Usage

Example of Using Public Data:

"""
SEM image from NanoMine database.
Reference: NanoMine Dataset L123 - Polymer Nanocomposite
Citation: Zhao, H. et al. (2016) Computational Materials Science
License: CC BY 4.0
URL: https://materialsmine.org/nm/L123
Scale bar: 500 nm (calibration: 2.5 nm/pixel)
"""

Recording Image Metadata:

IMAGE_METADATA = {
    'instrument': 'FEI Quanta 200',
    'accelerating_voltage': '20 kV',
    'magnification': '10000x',
    'working_distance': '10 mm',
    'pixel_size': 2.5,  # nm/pixel
    'scale_bar': 500,   # nm
    'detector': 'SE detector',
    'acquisition_date': '2025-10-15'
}

# Save metadata to JSON (ensuring reproducibility)
import json
with open('image_metadata.json', 'w') as f:
    json.dump(IMAGE_METADATA, f, indent=2)

Best Practices for Code Reproducibility

Recording Environment Information

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0
# - tensorflow>=2.13.0, <2.16.0

"""
Example: Recording Environment Information

Purpose: Demonstrate core concepts and implementation patterns
Target: Advanced
Execution time: ~5 seconds
Dependencies: None
"""

import sys
import cv2
import numpy as np
from skimage import __version__ as skimage_version
import tensorflow as tf

print("=== Image Analysis Environment ===")
print(f"Python: {sys.version}")
print(f"OpenCV: {cv2.__version__}")
print(f"NumPy: {np.__version__}")
print(f"scikit-image: {skimage_version}")
print(f"TensorFlow: {tf.__version__}")

# Recommended Versions (as of October 2025):
# - Python: 3.10 or higher
# - OpenCV: 4.8 or higher
# - NumPy: 1.24 or higher
# - scikit-image: 0.21 or higher
# - TensorFlow: 2.13 or higher (GPU version recommended)

Parameter Documentation

Bad Example（not reproducible）:

denoised = cv2.fastNlMeansDenoising(image, None, 10, 7, 21)  # Why these values?

Good Example（reproducible）:

# Non-Local MeansParameter Settings
NLM_H = 10  # Filter strength (corresponding to noise level, typical SEM values: 5-15)
NLM_TEMPLATE_WINDOW = 7  # Template window size (odd number, recommended: 7)
NLM_SEARCH_WINDOW = 21   # Search window size (odd number, recommended: 21)
NLM_DESCRIPTION = """
h: Adjust according to noise level. Low noise: 5-7, high noise: 10-15
templateWindowSize: 7 is standard (Smaller is faster but leaves residual noise)
searchWindowSize: 21 is standard (Larger is higher quality but slower)
"""
denoised = cv2.fastNlMeansDenoising(
    image, None, NLM_H, NLM_TEMPLATE_WINDOW, NLM_SEARCH_WINDOW
)

Recording Image Calibration

# Pixel-to-physical size conversion parameters
CALIBRATION_PARAMS = {
    'pixel_size_nm': 2.5,  # nm/pixel（Calibrated from scale bar）
    'scale_bar_length_nm': 500,  # Physical size of scale bar
    'scale_bar_pixels': 200,  # Number of pixels in scale bar
    'calibration_date': '2025-10-15',
    'calibration_method': 'Scale bar measurement',
    'uncertainty': 0.1  # nm/pixel（Calibration uncertainty）
}

# Convert pixels to physical size
def pixels_to_nm(pixels, calib_params=CALIBRATION_PARAMS):
    """Convert pixel count to physical size (nm)"""
    return pixels * calib_params['pixel_size_nm']

# Usage Example
diameter_pixels = 50
diameter_nm = pixels_to_nm(diameter_pixels)
print(f"Particle diameter: {diameter_nm:.1f} ± {CALIBRATION_PARAMS['uncertainty']*diameter_pixels:.1f} nm")

3.2 Image Preprocessing

Noise Removal

Code examples1: Comparison of Various Noise Removal Filters

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0
# - scipy>=1.11.0

import numpy as np
import matplotlib.pyplot as plt
import cv2
from skimage import filters, io
from scipy import ndimage

# Fixed random seed (ensuring reproducibility)
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Generate synthetic SEM image (simulating particles)
def generate_synthetic_sem(size=512, num_particles=30):
    """Generate synthetic SEM image"""
    image = np.zeros((size, size), dtype=np.float32)

    # Random particle placement
    for _ in range(num_particles):
        x = np.random.randint(50, size - 50)
        y = np.random.randint(50, size - 50)
        radius = np.random.randint(15, 35)

        # Circular particles
        Y, X = np.ogrid[:size, :size]
        mask = (X - x)**2 + (Y - y)**2 <= radius**2
        image[mask] = 200

    # Add Gaussian noise
    noise = np.random.normal(0, 25, image.shape)
    noisy_image = np.clip(image + noise, 0, 255).astype(np.uint8)

    return noisy_image

# Generate image
noisy_image = generate_synthetic_sem()

# Various noise removal filters
gaussian_blur = cv2.GaussianBlur(noisy_image, (5, 5), 1.0)
median_filter = cv2.medianBlur(noisy_image, 5)
bilateral_filter = cv2.bilateralFilter(noisy_image, 9, 75, 75)
nlm_filter = cv2.fastNlMeansDenoising(noisy_image, None, 10, 7, 21)

# Visualization
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].imshow(noisy_image, cmap='gray')
axes[0, 0].set_title('Noisy SEM Image')
axes[0, 0].axis('off')

axes[0, 1].imshow(gaussian_blur, cmap='gray')
axes[0, 1].set_title('Gaussian Blur')
axes[0, 1].axis('off')

axes[0, 2].imshow(median_filter, cmap='gray')
axes[0, 2].set_title('Median Filter')
axes[0, 2].axis('off')

axes[1, 0].imshow(bilateral_filter, cmap='gray')
axes[1, 0].set_title('Bilateral Filter')
axes[1, 0].axis('off')

axes[1, 1].imshow(nlm_filter, cmap='gray')
axes[1, 1].set_title('Non-Local Means')
axes[1, 1].axis('off')

# Noise Level Comparison
axes[1, 2].bar(['Original', 'Gaussian', 'Median', 'Bilateral', 'NLM'],
               [np.std(noisy_image),
                np.std(gaussian_blur),
                np.std(median_filter),
                np.std(bilateral_filter),
                np.std(nlm_filter)])
axes[1, 2].set_ylabel('Noise Level (std)')
axes[1, 2].set_title('Denoising Performance')
axes[1, 2].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("=== Noise Level (Standard Deviation) ===")
print(f"Original image: {np.std(noisy_image):.2f}")
print(f"Gaussian: {np.std(gaussian_blur):.2f}")
print(f"Median: {np.std(median_filter):.2f}")
print(f"Bilateral: {np.std(bilateral_filter):.2f}")
print(f"NLM: {np.std(nlm_filter):.2f}")

Filter Selection Guidelines: - Gaussian: Fast, smooth edges → General preprocessing - Median: Edge-preserving, robust to salt & pepper noise - Bilateral: Excellent edge preservation, moderate computational cost - Non-Local Means: Highest quality, high computational cost

Contrast Adjustment

Code examples2: Histogram Equalization and CLAHE

from skimage import exposure

# Contrast Adjustment
hist_eq = exposure.equalize_hist(noisy_image)
clahe = exposure.equalize_adapthist(noisy_image, clip_limit=0.03)

# Histogram computation
hist_original = np.histogram(noisy_image, bins=256, range=(0, 256))[0]
hist_eq_vals = np.histogram(
    (hist_eq * 255).astype(np.uint8), bins=256, range=(0, 256))[0]
hist_clahe_vals = np.histogram(
    (clahe * 255).astype(np.uint8), bins=256, range=(0, 256))[0]

# Visualization
fig = plt.figure(figsize=(16, 10))

# Images
ax1 = plt.subplot(2, 3, 1)
ax1.imshow(noisy_image, cmap='gray')
ax1.set_title('Original')
ax1.axis('off')

ax2 = plt.subplot(2, 3, 2)
ax2.imshow(hist_eq, cmap='gray')
ax2.set_title('Histogram Equalization')
ax2.axis('off')

ax3 = plt.subplot(2, 3, 3)
ax3.imshow(clahe, cmap='gray')
ax3.set_title('CLAHE (Adaptive)')
ax3.axis('off')

# Histogram
ax4 = plt.subplot(2, 3, 4)
ax4.hist(noisy_image.ravel(), bins=256, range=(0, 256), alpha=0.7)
ax4.set_xlabel('Pixel Value')
ax4.set_ylabel('Frequency')
ax4.set_title('Original Histogram')
ax4.grid(True, alpha=0.3)

ax5 = plt.subplot(2, 3, 5)
ax5.hist((hist_eq * 255).astype(np.uint8).ravel(),
         bins=256, range=(0, 256), alpha=0.7, color='orange')
ax5.set_xlabel('Pixel Value')
ax5.set_ylabel('Frequency')
ax5.set_title('Histogram Eq. Histogram')
ax5.grid(True, alpha=0.3)

ax6 = plt.subplot(2, 3, 6)
ax6.hist((clahe * 255).astype(np.uint8).ravel(),
         bins=256, range=(0, 256), alpha=0.7, color='green')
ax6.set_xlabel('Pixel Value')
ax6.set_ylabel('Frequency')
ax6.set_title('CLAHE Histogram')
ax6.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("=== Contrast Metrics ===")
print(f"Original image contrast: {noisy_image.max() - noisy_image.min()}")
print(f"Histogram Eq.: {(hist_eq * 255).max() - (hist_eq * 255).min():.1f}")
print(f"CLAHE: {(clahe * 255).max() - (clahe * 255).min():.1f}")

CLAHE（Contrast Limited Adaptive Histogram Equalization）Advantages: - Local contrast enhancement - Suppress excessive enhancement (clip_limit parameter) - Details visible in both dark and bright areas of SEM images

3.3 Particle Detection (Watershed Method)

Binarization and Distance Transform

Code examples3: Otsu method for automaticBinarization

from skimage import morphology, measure
from scipy.ndimage import distance_transform_edt

# Use image after noise removal
denoised = cv2.fastNlMeansDenoising(noisy_image, None, 10, 7, 21)

# Binarization using Otsu method
threshold = filters.threshold_otsu(denoised)
binary = denoised > threshold

# Morphological operations (small noise removal)
binary_cleaned = morphology.remove_small_objects(binary, min_size=50)
binary_cleaned = morphology.remove_small_holes(binary_cleaned, area_threshold=50)

# Distance transform
distance = distance_transform_edt(binary_cleaned)

# Visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 12))

axes[0, 0].imshow(denoised, cmap='gray')
axes[0, 0].set_title('Denoised Image')
axes[0, 0].axis('off')

axes[0, 1].imshow(binary, cmap='gray')
axes[0, 1].set_title(f'Binary (Otsu threshold={threshold:.1f})')
axes[0, 1].axis('off')

axes[1, 0].imshow(binary_cleaned, cmap='gray')
axes[1, 0].set_title('After Morphology')
axes[1, 0].axis('off')

axes[1, 1].imshow(distance, cmap='jet')
axes[1, 1].set_title('Distance Transform')
axes[1, 1].axis('off')
axes[1, 1].colorbar = plt.colorbar(axes[1, 1].imshow(distance, cmap='jet'),
                                   ax=axes[1, 1])

plt.tight_layout()
plt.show()

print(f"=== Binarization Results ===")
print(f"Otsu threshold: {threshold:.1f}")
print(f"White pixel percentage: {binary_cleaned.sum() / binary_cleaned.size * 100:.1f}%")

Particle Separation using Watershed Method

Code examples4: Watershed Segmentation

from skimage.feature import peak_local_max
from skimage.segmentation import watershed

# Detect local maxima (estimate particle centers)
local_max = peak_local_max(
    distance,
    min_distance=20,
    threshold_abs=5,
    labels=binary_cleaned
)

# Create markers
markers = np.zeros_like(distance, dtype=int)
markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)

# Execute Watershed
labels = watershed(-distance, markers, mask=binary_cleaned)

# Visualization
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Display markers
axes[0].imshow(denoised, cmap='gray')
axes[0].plot(local_max[:, 1], local_max[:, 0], 'r+',
             markersize=12, markeredgewidth=2)
axes[0].set_title(f'Detected Centers ({len(local_max)} particles)')
axes[0].axis('off')

# Watershed labels
axes[1].imshow(labels, cmap='nipy_spectral')
axes[1].set_title('Watershed Segmentation')
axes[1].axis('off')

# Overlay contours
overlay = denoised.copy()
overlay_rgb = cv2.cvtColor(overlay, cv2.COLOR_GRAY2RGB)
for region in measure.regionprops(labels):
    minr, minc, maxr, maxc = region.bbox
    cv2.rectangle(overlay_rgb, (minc, minr), (maxc, maxr),
                  (255, 0, 0), 2)

axes[2].imshow(overlay_rgb)
axes[2].set_title('Detected Particles')
axes[2].axis('off')

plt.tight_layout()
plt.show()

print(f"=== Watershed Results ===")
print(f"Number of detected particles: {len(local_max)}")
print(f"Number of labels: {labels.max()}")

3.4 Particle Size Distribution Analysis

Extraction of Particle Features

Code examples5: Calculation of Particle Diameter and Shape Parameters

# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0

"""
Example: Code examples5: Calculation of Particle Diameter and Shape P

Purpose: Demonstrate data visualization techniques
Target: Beginner to Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""

# Feature extraction for each particle
particle_data = []

for region in measure.regionprops(labels):
    # Calculate equivalent circular diameter from area
    area = region.area
    equivalent_diameter = np.sqrt(4 * area / np.pi)

    # Aspect ratio
    major_axis = region.major_axis_length
    minor_axis = region.minor_axis_length
    aspect_ratio = major_axis / (minor_axis + 1e-10)

    # Circularity (4π × area / perimeter²)
    perimeter = region.perimeter
    circularity = 4 * np.pi * area / (perimeter ** 2 + 1e-10)

    particle_data.append({
        'label': region.label,
        'area': area,
        'diameter': equivalent_diameter,
        'aspect_ratio': aspect_ratio,
        'circularity': circularity,
        'centroid': region.centroid
    })

# DataFrameConvert to
import pandas as pd
df_particles = pd.DataFrame(particle_data)

print("=== Particle Feature Statistics ===")
print(df_particles[['diameter', 'aspect_ratio', 'circularity']].describe())

# Particle size distribution plot
fig, axes = plt.subplots(2, 2, figsize=(14, 12))

# Histogram
axes[0, 0].hist(df_particles['diameter'], bins=20, alpha=0.7,
                edgecolor='black')
axes[0, 0].set_xlabel('Diameter (pixels)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Particle Size Distribution')
axes[0, 0].axvline(df_particles['diameter'].mean(), color='red',
                   linestyle='--', label=f'Mean: {df_particles["diameter"].mean():.1f}')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3, axis='y')

# Cumulative distribution
sorted_diameters = np.sort(df_particles['diameter'])
cumulative = np.arange(1, len(sorted_diameters) + 1) / len(sorted_diameters) * 100
axes[0, 1].plot(sorted_diameters, cumulative, linewidth=2)
axes[0, 1].set_xlabel('Diameter (pixels)')
axes[0, 1].set_ylabel('Cumulative Percentage (%)')
axes[0, 1].set_title('Cumulative Size Distribution')
axes[0, 1].axhline(50, color='red', linestyle='--',
                   label=f'D50: {np.median(df_particles["diameter"]):.1f}')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Scatter plot (diameter vs aspect ratio)
axes[1, 0].scatter(df_particles['diameter'],
                   df_particles['aspect_ratio'],
                   alpha=0.6, s=50)
axes[1, 0].set_xlabel('Diameter (pixels)')
axes[1, 0].set_ylabel('Aspect Ratio')
axes[1, 0].set_title('Diameter vs Aspect Ratio')
axes[1, 0].grid(True, alpha=0.3)

# Scatter plot (diameter vs circularity)
axes[1, 1].scatter(df_particles['diameter'],
                   df_particles['circularity'],
                   alpha=0.6, s=50, color='green')
axes[1, 1].set_xlabel('Diameter (pixels)')
axes[1, 1].set_ylabel('Circularity')
axes[1, 1].set_title('Diameter vs Circularity')
axes[1, 1].axhline(0.8, color='red', linestyle='--',
                   label='Spherical threshold')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Particle diameter statistics
print("\n=== Particle Diameter Statistics ===")
print(f"Mean diameter: {df_particles['diameter'].mean():.2f} pixels")
print(f"Median(D50): {df_particles['diameter'].median():.2f} pixels")
print(f"Standard deviation: {df_particles['diameter'].std():.2f} pixels")
print(f"Minimum diameter: {df_particles['diameter'].min():.2f} pixels")
print(f"Maximum diameter: {df_particles['diameter'].max():.2f} pixels")

Particle Size Distribution Fitting (Log-Normal Distribution)

Code examples6: Log-Normal Distribution Fitting

from scipy.stats import lognorm

# Log-normal distribution fitting
diameters = df_particles['diameter'].values
shape, loc, scale = lognorm.fit(diameters, floc=0)

# Fitting results
x = np.linspace(diameters.min(), diameters.max(), 200)
pdf_fitted = lognorm.pdf(x, shape, loc, scale)

# Visualization
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.hist(diameters, bins=20, density=True, alpha=0.6,
         label='Observed', edgecolor='black')
plt.plot(x, pdf_fitted, 'r-', linewidth=2,
         label=f'Log-normal fit (σ={shape:.2f})')
plt.xlabel('Diameter (pixels)')
plt.ylabel('Probability Density')
plt.title('Particle Size Distribution Fitting')
plt.legend()
plt.grid(True, alpha=0.3)

# Q-Q Plot (check goodness of fit)
plt.subplot(1, 2, 2)
theoretical_quantiles = lognorm.ppf(np.linspace(0.01, 0.99, 100),
                                    shape, loc, scale)
observed_quantiles = np.percentile(diameters, np.linspace(1, 99, 100))
plt.scatter(theoretical_quantiles, observed_quantiles, alpha=0.6)
plt.plot([diameters.min(), diameters.max()],
         [diameters.min(), diameters.max()],
         'r--', linewidth=2, label='Perfect fit')
plt.xlabel('Theoretical Quantiles')
plt.ylabel('Observed Quantiles')
plt.title('Q-Q Plot (Log-normal)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("=== Log-Normal Distribution Parameters ===")
print(f"Shape (σ): {shape:.3f}")
print(f"Scale (median): {scale:.2f} pixels")

3.5 Image Classification using Deep Learning

Material Image Classification using Transfer Learning (VGG16)

Code examples7: Material Phase Classification using CNN

# Requirements:
# - Python 3.9+
# - tensorflow>=2.13.0, <2.16.0

"""
Example: Code examples7: Material Phase Classification using CNN

Purpose: Demonstrate data visualization techniques
Target: Advanced
Execution time: 30-60 seconds
Dependencies: None
"""

# TensorFlow/KerasImport
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.applications import VGG16
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    TENSORFLOW_AVAILABLE = True

    # TensorFlowRecord versions (reproducibility)
    print(f"TensorFlow version: {tf.__version__}")
    print(f"Keras version: {keras.__version__}")

    # Fix random seed
    RANDOM_SEED = 42
    tf.random.set_seed(RANDOM_SEED)
    np.random.seed(RANDOM_SEED)
except ImportError:
    TENSORFLOW_AVAILABLE = False
    print("TensorFlow not available. Skipping this example.")

if TENSORFLOW_AVAILABLE:
    # Generate sample images (3-class classification)
    def generate_material_images(num_samples=100, img_size=128):
        """
        Generate material image samples
        Class 0: Spherical particles
        Class 1: Rod-shaped particles
        Class 2: Irregular particles
        """
        images = []
        labels = []

        for class_id in range(3):
            for _ in range(num_samples):
                img = np.zeros((img_size, img_size), dtype=np.uint8)

                if class_id == 0:  # Spherical
                    num_particles = np.random.randint(5, 15)
                    for _ in range(num_particles):
                        x = np.random.randint(20, img_size - 20)
                        y = np.random.randint(20, img_size - 20)
                        r = np.random.randint(8, 15)
                        cv2.circle(img, (x, y), r, 200, -1)

                elif class_id == 1:  # Rod-shaped
                    num_rods = np.random.randint(3, 8)
                    for _ in range(num_rods):
                        x1 = np.random.randint(10, img_size - 10)
                        y1 = np.random.randint(10, img_size - 10)
                        length = np.random.randint(30, 60)
                        angle = np.random.rand() * 2 * np.pi
                        x2 = int(x1 + length * np.cos(angle))
                        y2 = int(y1 + length * np.sin(angle))
                        cv2.line(img, (x1, y1), (x2, y2), 200, 3)

                else:  # Irregular
                    num_shapes = np.random.randint(5, 12)
                    for _ in range(num_shapes):
                        pts = np.random.randint(10, img_size - 10,
                                                size=(6, 2))
                        cv2.fillPoly(img, [pts], 200)

                # Add noise
                noise = np.random.normal(0, 20, img.shape)
                img = np.clip(img + noise, 0, 255).astype(np.uint8)

                # Convert to RGB
                img_rgb = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
                images.append(img_rgb)
                labels.append(class_id)

        return np.array(images), np.array(labels)

    # Generate data
    X_data, y_data = generate_material_images(num_samples=150)

    # Split training and test data
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(
        X_data, y_data, test_size=0.2, random_state=RANDOM_SEED
    )

    # Data normalization
    X_train = X_train.astype('float32') / 255.0
    X_test = X_test.astype('float32') / 255.0

    # VGG16Load model (ImageNet weights)
    base_model = VGG16(
        weights='imagenet',
        include_top=False,
        input_shape=(128, 128, 3)
    )

    # Freeze base model weights
    base_model.trainable = False

    # Add new classification layers
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(128, activation='relu')(x)
    predictions = Dense(3, activation='softmax')(x)

    model = Model(inputs=base_model.input, outputs=predictions)

    # Compile model
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    # Train model
    history = model.fit(
        X_train, y_train,
        validation_split=0.2,
        epochs=10,
        batch_size=16,
        verbose=0
    )

    # Evaluate on test data
    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)

    print("=== CNN Classification Results ===")
    print(f"Test accuracy: {test_acc * 100:.2f}%")

    # Plot learning curves
    plt.figure(figsize=(14, 5))

    plt.subplot(1, 2, 1)
    plt.plot(history.history['loss'], label='Training Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Training and Validation Loss')
    plt.legend()
    plt.grid(True, alpha=0.3)

    plt.subplot(1, 2, 2)
    plt.plot(history.history['accuracy'], label='Training Accuracy')
    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title('Training and Validation Accuracy')
    plt.legend()
    plt.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    # Confusion Matrix
    from sklearn.metrics import confusion_matrix, classification_report
    y_pred = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred, axis=1)

    cm = confusion_matrix(y_test, y_pred_classes)

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap='Blues')
    plt.title('Confusion Matrix')
    plt.colorbar()
    tick_marks = np.arange(3)
    plt.xticks(tick_marks, ['Spherical', 'Rod', 'Irregular'])
    plt.yticks(tick_marks, ['Spherical', 'Rod', 'Irregular'])

    # Display numerical values
    for i in range(3):
        for j in range(3):
            plt.text(j, i, cm[i, j], ha='center', va='center',
                    color='white' if cm[i, j] > cm.max() / 2 else 'black')

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()
    plt.show()

    print("\n=== Classification Report ===")
    print(classification_report(
        y_test, y_pred_classes,
        target_names=['Spherical', 'Rod', 'Irregular']
    ))

3.6 Integrated Image Analysis Pipeline

Building an Automated Analysis System

Code examples8: Image Analysis Pipeline Class

from dataclasses import dataclass
from typing import List, Dict
import json

@dataclass
class ParticleAnalysisResult:
    """particlesAnalysis Results"""
    num_particles: int
    mean_diameter: float
    std_diameter: float
    mean_circularity: float
    particle_data: List[Dict]

class SEMImageAnalyzer:
    """SEM Image Automated Analysis System"""

    def __init__(self, img_size=(512, 512)):
        self.img_size = img_size
        self.image = None
        self.binary = None
        self.labels = None
        self.particles = []

    def load_image(self, image: np.ndarray):
        """Load image"""
        if len(image.shape) == 3:
            self.image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
        else:
            self.image = image

        # Resize
        self.image = cv2.resize(self.image, self.img_size)

    def preprocess(self, denoise_strength=10):
        """Preprocessing（Noise Removal・Contrast Adjustment）"""
        # Noise Removal
        denoised = cv2.fastNlMeansDenoising(
            self.image, None, denoise_strength, 7, 21
        )

        # CLAHE
        clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
        enhanced = clahe.apply(denoised)

        self.image = enhanced

    def segment_particles(self, min_size=50):
        """particlesSegmentation"""
        # OtsuBinarization
        threshold = filters.threshold_otsu(self.image)
        binary = self.image > threshold

        # Morphology
        binary_cleaned = morphology.remove_small_objects(
            binary, min_size=min_size
        )
        binary_cleaned = morphology.remove_small_holes(
            binary_cleaned, area_threshold=min_size
        )

        # Distance transform
        distance = distance_transform_edt(binary_cleaned)

        # Watershed
        local_max = peak_local_max(
            distance,
            min_distance=20,
            threshold_abs=5,
            labels=binary_cleaned
        )

        markers = np.zeros_like(distance, dtype=int)
        markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)

        self.labels = watershed(-distance, markers, mask=binary_cleaned)
        self.binary = binary_cleaned

    def extract_features(self):
        """Feature extraction"""
        self.particles = []

        for region in measure.regionprops(self.labels):
            area = region.area
            diameter = np.sqrt(4 * area / np.pi)
            aspect_ratio = region.major_axis_length / \
                          (region.minor_axis_length + 1e-10)
            circularity = 4 * np.pi * area / \
                         (region.perimeter ** 2 + 1e-10)

            self.particles.append({
                'label': region.label,
                'area': area,
                'diameter': diameter,
                'aspect_ratio': aspect_ratio,
                'circularity': circularity,
                'centroid': region.centroid
            })

    def get_results(self) -> ParticleAnalysisResult:
        """Get results"""
        df = pd.DataFrame(self.particles)

        return ParticleAnalysisResult(
            num_particles=len(self.particles),
            mean_diameter=df['diameter'].mean(),
            std_diameter=df['diameter'].std(),
            mean_circularity=df['circularity'].mean(),
            particle_data=self.particles
        )

    def visualize(self):
        """Visualize results"""
        fig, axes = plt.subplots(2, 2, figsize=(14, 14))

        # Original image
        axes[0, 0].imshow(self.image, cmap='gray')
        axes[0, 0].set_title('Preprocessed Image')
        axes[0, 0].axis('off')

        # Binarization
        axes[0, 1].imshow(self.binary, cmap='gray')
        axes[0, 1].set_title('Binary Segmentation')
        axes[0, 1].axis('off')

        # Watershed labels
        axes[1, 0].imshow(self.labels, cmap='nipy_spectral')
        axes[1, 0].set_title(f'Particles ({len(self.particles)})')
        axes[1, 0].axis('off')

        # Particle size distribution
        df = pd.DataFrame(self.particles)
        axes[1, 1].hist(df['diameter'], bins=20, alpha=0.7,
                       edgecolor='black')
        axes[1, 1].set_xlabel('Diameter (pixels)')
        axes[1, 1].set_ylabel('Frequency')
        axes[1, 1].set_title('Particle Size Distribution')
        axes[1, 1].grid(True, alpha=0.3, axis='y')

        plt.tight_layout()
        plt.show()

    def save_results(self, filename='analysis_results.json'):
        """Save results to JSON"""
        results = self.get_results()
        output = {
            'num_particles': results.num_particles,
            'mean_diameter': results.mean_diameter,
            'std_diameter': results.std_diameter,
            'mean_circularity': results.mean_circularity,
            'particles': results.particle_data
        }

        with open(filename, 'w') as f:
            json.dump(output, f, indent=2)

        print(f"Results saved to {filename}")

# Usage Example
analyzer = SEMImageAnalyzer()
analyzer.load_image(noisy_image)
analyzer.preprocess(denoise_strength=10)
analyzer.segment_particles(min_size=50)
analyzer.extract_features()

results = analyzer.get_results()
print("=== Analysis Results ===")
print(f"Detected particles: {results.num_particles}")
print(f"Mean diameter: {results.mean_diameter:.2f} ± {results.std_diameter:.2f} pixels")
print(f"Mean circularity: {results.mean_circularity:.3f}")

analyzer.visualize()
analyzer.save_results('sem_analysis.json')

3.7 Practical Pitfalls and Solutions

Common Failure Examples and Best Practices

Failure 1: Excessive Segmentation (Over-segmentation)

Symptom: Single particle is divided into multiple parts

Cause: Watershed min_distance is too small, or residual noise remains

Solution:

# ❌ Bad Example: Excessive segmentation
local_max = peak_local_max(distance, min_distance=5)  # Too small
# Result: Noise or minor irregularities within particles are recognized as separate particles

# ✅ Good Example: Appropriate min_distance setting
# Set based on typical particle diameter
expected_diameter_pixels = 30
min_distance = int(expected_diameter_pixels * 0.6)  # About 60% of diameter

local_max = peak_local_max(
    distance,
    min_distance=min_distance,
    threshold_abs=5,  # Threshold for distance transform values
    labels=binary_cleaned
)
print(f"min_distance: {min_distance} pixels")
print(f"Number of detected particles: {len(local_max)}")

Pitfall 2: Incorrect Threshold Selection

Symptom: Part of particles is missing, or background is detected as particles

Cause: Otsu method is inappropriate (non-bimodal histogram)

Solution:

# ❌ Bad Example: Apply Otsu unconditionally to all images
threshold = filters.threshold_otsu(image)
binary = image > threshold

# ✅ Good Example: Histogram verification and fallback
threshold_otsu = filters.threshold_otsu(image)

# Check histogram shape
hist, bin_edges = np.histogram(image, bins=256, range=(0, 256))

# Simple check for bimodality (two peaks)
from scipy.signal import find_peaks
peaks, _ = find_peaks(hist, prominence=100)

if len(peaks) >= 2:
    # Bimodal → Otsu applicable
    threshold = threshold_otsu
    print(f"Otsu threshold: {threshold:.1f} (bimodal histogram)")
else:
    # Unimodal → Alternative method (Mean + Standard deviation, etc.)
    threshold = image.mean() + image.std()
    print(f"Mean+Std threshold: {threshold:.1f} (unimodal histogram)")
    print("Warning: Histogram is unimodal. Otsu method may be inappropriate")

binary = image > threshold

Pitfall 3: Lack of Pixel Calibration

Symptom: Physical size (nm, μm) is unknown, data at different magnifications cannot be compared

Cause: Scale bar information not utilized

Solution:

# ❌ Bad Example: Report in pixel units
print(f"Mean particle diameter: {mean_diameter:.1f} pixels")  # Physical size unknown

# ✅ Good Example: Scale bar calibration
# Step 1: Measure scale bar length (manual or OCR)
SCALE_BAR_NM = 500  # Physical size of scale bar (nm)
SCALE_BAR_PIXELS = 200  # Number of pixels in scale bar

# Step 2: Calculate calibration coefficient
nm_per_pixel = SCALE_BAR_NM / SCALE_BAR_PIXELS
print(f"Calibration: {nm_per_pixel:.2f} nm/pixel")

# Step 3: Convert to physical size
mean_diameter_nm = mean_diameter * nm_per_pixel
std_diameter_nm = std_diameter * nm_per_pixel

print(f"Mean particle diameter: {mean_diameter_nm:.1f} ± {std_diameter_nm:.1f} nm")

# Step 4: Save calibration information to results
calibration_info = {
    'scale_bar_nm': SCALE_BAR_NM,
    'scale_bar_pixels': SCALE_BAR_PIXELS,
    'nm_per_pixel': nm_per_pixel,
    'calibration_date': '2025-10-19'
}

Pitfall 4: CNN Overfitting (Biased Training Data)

Symptom: High training accuracy but poor performance on test data

Cause: Dataset bias, insufficient data augmentation

Solution:

# ❌ Bad Example: No data augmentation, small dataset
model.fit(X_train, y_train, epochs=50, batch_size=16)
# Result: Overfitting to training data

# ✅ Good Example: Data augmentation and Early Stopping
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping

# Data augmentation settings
datagen = ImageDataGenerator(
    rotation_range=20,  # Random rotation
    width_shift_range=0.1,  # Horizontal shift
    height_shift_range=0.1,  # Vertical shift
    horizontal_flip=True,  # Horizontal flip
    zoom_range=0.1,  # Zoom
    fill_mode='nearest'
)

# Early Stopping (Stop training when validation loss does not improve)
early_stop = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

# Training
history = model.fit(
    datagen.flow(X_train, y_train, batch_size=16),
    validation_data=(X_test, y_test),
    epochs=50,
    callbacks=[early_stop],
    verbose=1
)

print(f"Training stopped at epoch: {len(history.history['loss'])}")

Pitfall 5: Batch Effect in Batch Processing

Symptom: Even for samples under the same conditions, particle diameter differs systematically by measurement date

Cause: Instrument drift over time, calibration deviation

Solution:

# ❌ Bad Example: No correction between batches
results = []
for image_file in image_files:
    analyzer.load_image(image_file)
    analyzer.preprocess()
    results.append(analyzer.get_results())

# ✅ Good Example: Correction using reference material
from datetime import datetime

# Measurement of reference material (known particle diameter)
STANDARD_DIAMETER_NM = 100.0  # Known physical size

def calibrate_with_standard(standard_image, expected_diameter_nm):
    """Get calibration coefficient from reference material"""
    analyzer_std = SEMImageAnalyzer()
    analyzer_std.load_image(standard_image)
    analyzer_std.preprocess()
    analyzer_std.segment_particles()
    analyzer_std.extract_features()

    results_std = analyzer_std.get_results()
    measured_diameter_pixels = results_std.mean_diameter

    nm_per_pixel = expected_diameter_nm / measured_diameter_pixels

    calibration = {
        'date': datetime.now().isoformat(),
        'nm_per_pixel': nm_per_pixel,
        'standard_diameter_nm': expected_diameter_nm,
        'measured_pixels': measured_diameter_pixels
    }

    return calibration

# Measure standard sample at the beginning of each batch
batch_calibrations = {}
for batch_id, batch_images in batches.items():
    # Standard sample measurement
    standard_image = load_standard_image(batch_id)
    calib = calibrate_with_standard(standard_image, STANDARD_DIAMETER_NM)
    batch_calibrations[batch_id] = calib

    print(f"Batch {batch_id}: {calib['nm_per_pixel']:.3f} nm/pixel")

    # Analysis of samples in batch (apply calibration coefficient)
    for image_file in batch_images:
        analyzer.load_image(image_file)
        analyzer.preprocess()
        analyzer.segment_particles()
        analyzer.extract_features()

        results = analyzer.get_results()

        # Convert to physical size using calibration coefficient
        diameter_nm = results.mean_diameter * calib['nm_per_pixel']
        print(f"  Sample: {diameter_nm:.1f} nm")

3.8 Image Analysis Skills Checklist

Image Preprocessing Skills

Foundational Level

[ ] Can explain characteristics of SEM/TEM images (contrast, noise)
[ ] Can load images with OpenCV and convert to grayscale
[ ] Understand the difference between histogram equalization and CLAHE
[ ] Can perform noise removal with Gaussian filter
[ ] Can perform binarization using Otsu method

Applied Level

[ ] Can appropriately set parameters for Non-Local Means filter
[ ] Can diagnose poor image contrast and select appropriate preprocessing
[ ] Can use morphological operations (opening, closing)
[ ] Can select appropriate threshold method based on histogram shape
[ ] Can combine multiple filters to build optimal preprocessing pipeline

Advanced Level

[ ] Can automatically detect image quality variations in batch processing and perform adaptive preprocessing
[ ] Can design and implement custom filters (frequency domain filters)
[ ] Can implement deep learning denoising (DnCNN)
[ ] Can quantitatively evaluate the impact of image preprocessing on subsequent analysis

Particle Detection and Segmentation Skills

Foundational Level

[ ] Understand the concept of distance transform
[ ] Can explain the basic principles of Watershed method
[ ] Can count the number of particles
[ ] Can calculate particle diameter (equivalent circular diameter)
[ ] Can visualize detection results

Applied Level

[ ] Can set Watershed min_distance parameter according to particle diameter
[ ] Can diagnose over-detection and missed detection, and adjust parameters
[ ] Can separate overlapping particles
[ ] Can calculate and interpret shape parameters (circularity, aspect ratio)
[ ] Can fit particle size distribution with statistical model (log-normal distribution)

Advanced Level

[ ] Can apply Active Contour model to complex-shaped particles
[ ] Can implement semantic segmentation using machine learning (U-Net)
[ ] Can perform 3D segmentation of 3D images (X-ray CT)
[ ] Can quantitatively evaluate segmentation accuracy (Dice coefficient, IoU)

Image Measurement and Calibration Skills

Foundational Level

[ ] Can locate scale bar position
[ ] Can manually convert pixel to physical size
[ ] Can report particle diameter in nm/μm units
[ ] Recognize the existence of measurement error

Applied Level

[ ] Can automatically calculate calibration coefficient from scale bar
[ ] Can analyze images at different magnifications uniformly
[ ] Can estimate calibration uncertainty and explicitly state uncertainty in results
[ ] Can evaluate measurement repeatability (reproducibility)
[ ] Can perform instrument calibration using reference materials

Advanced Level

[ ] Can correct for batch effects (periodic measurement of standard samples)
[ ] Can propagate measurement uncertainty (ISO GUM)
[ ] Can verify measurement value compatibility between different instruments
[ ] Can establish traceable calibration chain

Deep Learning and Image Classification Skills

Foundational Level

[ ] Understand basic CNN structure (convolutional layers, pooling layers)
[ ] Can explain the concept of transfer learning
[ ] Can split training and test data
[ ] Can evaluate model accuracy
[ ] Can interpret confusion matrix

Applied Level

[ ] Can apply data augmentation
[ ] Can prevent overfitting with Early Stopping
[ ] Can adjust hyperparameters (learning rate, batch size)
[ ] Can compare different architectures (VGG, ResNet, EfficientNet)
[ ] Can visualize decision basis using Grad-CAM

Advanced Level

[ ] Can design custom CNN architectures
[ ] Can handle imbalanced data (class weights, SMOTE)
[ ] Can implement multi-task learning (classification + regression)
[ ] Can quantify model uncertainty (Bayesian Deep Learning)
[ ] Can implement few-shot learning with limited data

Integrated Skills: Overall Workflow

Foundational Level

[ ] Can execute workflow: Load image → Preprocessing → Segmentation → Feature Extraction
[ ] Can summarize and report results in tables
[ ] Can visualize using graphs (histograms, scatter plots)

Applied Level

[ ] Can automate batch processing of multiple images
[ ] Can design processing pipeline as classes
[ ] Can save results in JSON/CSV format
[ ] Can implement error handling and logging
[ ] Can manage parameters in YAML/JSON files

Advanced Level

[ ] Can publish analysis tool with web interface (Streamlit, Gradio)
[ ] Can build scalable batch processing on cloud (AWS, GCP)
[ ] Can store and search results in database (MongoDB)
[ ] Can automate testing with CI/CD pipeline
[ ] Can automatically generate publication-quality figures and reports

3.9 Comprehensive Skills Assessment

Please conduct self-assessment based on the following criteria.

Level 1: Beginner (60%+ of foundational skills)

Can perform basic image preprocessing and particle detection
Can understand existing code and adjust parameters
Can complete analysis on simple datasets

Next Steps: - Master Applied Level techniques (Watershed, data augmentation) - Challenge more complex real data - Deepen understanding of parameter meanings

Level 2: Intermediate (100% foundational + 60%+ applied)

Can select appropriate preprocessing and segmentation for complex image data
Can implement image classification using CNN
Can perform batch processing and error handling

Next Steps: - Challenge advanced techniques (custom CNN, 3D analysis) - Practice in research projects - Optimize code and improve scalability

Level 3: Advanced (100% foundational + 100% applied + 60%+ advanced)

Can design and implement new algorithms
Can execute research paper-level analysis
Can provide tools to other researchers

Next Steps: - Develop original methods - Paper writing and conference presentations - Contribute to open source

Level 4: Expert (90%+ of all items)

Can lead the field of materials image analysis
Can develop analysis methods for new measurement techniques
Contributing to international community

Activity Examples: - Invited talks and tutorials - Leading collaborative research - Participating in standardization activities

3.10 Action Plan Template

Current Level: _____

Target Level (in 3 months): _____

Priority Skills to Strengthen (select 3):

Specific Action Plan:

Week 1-2: - [ ] Action 1: ____ - [ ] Action 2: ______

Week 3-4: - [ ] Action 1: ____ - [ ] Action 2: ______

Week 5-8: - [ ] Action 1: ____ - [ ] Action 2: ______

Week 9-12: - [ ] Action 1: ____ - [ ] Action 2: ______

Evaluation Metrics:

[ ] Complete analysis on custom dataset
[ ] Present at research meeting/seminar
[ ] Publish results on GitHub/paper

3.11 Chapter Summary

What We Learned

Data Licensing and Reproducibility - Utilization of image data repositories - Recording image metadata and calibration information - Best Practices for Code Reproducibility
Image Preprocessing - Noise Removal (Gaussian, Median, Bilateral, NLM) - Contrast Adjustment (Histogram Equalization, CLAHE) - Binarization (Otsu method)
Particle Detection - Segmentation using Watershed method - Distance transform and local maxima detection - Morphological operations
Quantitative Analysis - Particle size distribution (histogram, cumulative distribution) - Shape parameters (circularity, aspect ratio) - Log-normal distribution fitting
Deep Learning - Transfer learning (VGG16) - Material image classification - Performance evaluation using confusion matrix
Practical Pitfalls - Avoiding excessive segmentation - Importance of pixel calibration - Correction of batch effects - Solutions for CNN overfitting

Key Points

✅ Always record image metadata (magnification, scale bar)
✅ Preprocessing quality determines segmentation accuracy
✅ Parameter tuning is important for Watershed method (min_distance, threshold_abs)
✅ Report particle diameter in physical units (nm, μm)
✅ Regularly verify pixel calibration with reference materials
✅ Transfer learning enables high-accuracy classification even with limited data
✅ Prevent overfitting with data augmentation and Early Stopping

Next Chapter

In Chapter 4, we will learn about time-series data and integrated analysis: - Preprocessing of temperature and pressure sensor data - Moving window analysis - Anomaly detection - Dimensionality reduction using PCA - Automation using sklearn Pipeline

Chapter 4: Time-Series Data and Integrated Analysis →

Exercises

Problem 1 (Difficulty: easy)

Determine whether the following statements are true or false.

Bilateral Filter can preserve edges better than Gaussian Filter
CLAHE applies the same histogram equalization to the entire image
In the Watershed method, local maxima of the distance transform are considered as particle centers

Hint

1. Operating principle of Bilateral Filter (considers both spatial distance and intensity difference) 2. CLAHE "Adaptive" meaning of 3. Watershed algorithm flow (Distance transform → markers → watershed)

Solution Example

**Answer**: 1. **True** - Bilateral Filter considers intensity differences, so smoothing is suppressed near edges 2. **False** - CLAHE divides the image into small regions (tiles) and performs adaptive histogram equalization for each region 3. **True** - Local maxima of the distance transform correspond to particle centers and are used as markers for Watershed **Explanation**: In image preprocessing, it is important to select appropriate methods according to the processing purpose (noise removal vs edge preservation). Watershed is a powerful method that can separate even touching particles by utilizing distance transform.

Problem 2 (Difficulty: medium)

Perform particle detection and particle size distribution analysis on the following SEM image data.

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: Perform particle detection and particle size distribution an

Purpose: Demonstrate core concepts and implementation patterns
Target: Beginner to Intermediate
Execution time: ~5 seconds
Dependencies: None
"""

import numpy as np

# Generate sample SEM image
def generate_sample_sem():
    np.random.seed(100)
    img = np.zeros((512, 512), dtype=np.uint8)

    for _ in range(40):
        x = np.random.randint(30, 482)
        y = np.random.randint(30, 482)
        r = np.random.randint(10, 25)
        cv2.circle(img, (x, y), r, 200, -1)

    noise = np.random.normal(0, 30, img.shape)
    return np.clip(img + noise, 0, 255).astype(np.uint8)

sample_image = generate_sample_sem()

Requirements: 1. Noise removal using Non-Local Means 2. Binarization using Otsu method 3. Particle detection using Watershed method 4. Plot particle size distribution as histogram 5. Output mean particle diameter and standard deviation

Hint

**Processing Flow**: 1. Noise removal using `cv2.fastNlMeansDenoising` 2. Calculate threshold using `filters.threshold_otsu` → Binarization 3. `distance_transform_edt` + `peak_local_max` + `watershed` 4. Feature extraction for particles using `measure.regionprops` 5. Visualization using `matplotlib.pyplot.hist`

Solution Example

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0

"""
Example: Requirements:
1. Noise removal using Non-Local Means
2. Bina

Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""

import numpy as np
import cv2
import matplotlib.pyplot as plt
from skimage import filters, morphology, measure
from skimage.feature import peak_local_max
from skimage.segmentation import watershed
from scipy.ndimage import distance_transform_edt

# Generate sample image
def generate_sample_sem():
    np.random.seed(100)
    img = np.zeros((512, 512), dtype=np.uint8)

    for _ in range(40):
        x = np.random.randint(30, 482)
        y = np.random.randint(30, 482)
        r = np.random.randint(10, 25)
        cv2.circle(img, (x, y), r, 200, -1)

    noise = np.random.normal(0, 30, img.shape)
    return np.clip(img + noise, 0, 255).astype(np.uint8)

sample_image = generate_sample_sem()

# Step1: Noise Removal
denoised = cv2.fastNlMeansDenoising(sample_image, None, 10, 7, 21)

# Step2: OtsuBinarization
threshold = filters.threshold_otsu(denoised)
binary = denoised > threshold
binary = morphology.remove_small_objects(binary, min_size=30)

# Step 3: Watershed
distance = distance_transform_edt(binary)
local_max = peak_local_max(distance, min_distance=15,
                           threshold_abs=3, labels=binary)
markers = np.zeros_like(distance, dtype=int)
markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)
labels = watershed(-distance, markers, mask=binary)

# Step4: Calculate particle diameter
diameters = []
for region in measure.regionprops(labels):
    area = region.area
    diameter = np.sqrt(4 * area / np.pi)
    diameters.append(diameter)

diameters = np.array(diameters)

# Step5: Statistics and visualization
print("=== Particle Diameter Statistics ===")
print(f"Detected particles: {len(diameters)}")
print(f"MeanParticle diameter: {diameters.mean():.2f} pixels")
print(f"Standard deviation: {diameters.std():.2f} pixels")

fig, axes = plt.subplots(2, 2, figsize=(14, 14))

axes[0, 0].imshow(sample_image, cmap='gray')
axes[0, 0].set_title('Original Image')
axes[0, 0].axis('off')

axes[0, 1].imshow(binary, cmap='gray')
axes[0, 1].set_title(f'Binary (Otsu={threshold:.1f})')
axes[0, 1].axis('off')

axes[1, 0].imshow(labels, cmap='nipy_spectral')
axes[1, 0].set_title(f'Detected Particles ({len(diameters)})')
axes[1, 0].axis('off')

axes[1, 1].hist(diameters, bins=15, alpha=0.7, edgecolor='black')
axes[1, 1].axvline(diameters.mean(), color='red', linestyle='--',
                  label=f'Mean: {diameters.mean():.1f}')
axes[1, 1].set_xlabel('Diameter (pixels)')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Particle Size Distribution')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

**Output Example**:

=== Particle Diameter Statistics ===
Detected particles: 38
MeanParticle diameter: 30.45 pixels
Standard deviation: 8.23 pixels

**Explanation**: In this example, setting min_distance=15 for Watershed suppresses over-detection of adjacent particles. The variability in particle diameter (standard deviation) is due to the synthesis process. In real data, it reflects measurement conditions and material inhomogeneity.

Problem 3 (Difficulty: hard)

Build a system to automatically process multiple SEM images and perform statistical comparison of particle size distributions.

Background: For material samples A, B, and C prepared under different synthesis conditions, 10 SEM images were taken for each. It is necessary to automatically analyze the particle size distribution of each sample and statistically compare them.

Task: 1. Automatically analyze 30 images through batch processing 2. Visualize particle size distribution for each sample 3. Statistical significance testing using analysis of variance (ANOVA) 4. Output results as PDF report

Constraints: - Measurement conditions (magnification, exposure) may differ for each image - Some images have poor contrast - Processing time: within 5 seconds/image

Hint

**Design Guidelines**: 1. Extend `SEMImageAnalyzer` class 2. Adaptive preprocessing (contrast determination via histogram analysis) 3. Save results in structured format (JSON/CSV) 4. `scipy.stats.f_oneway`for ANOVA 5. `matplotlib.backends.backend_pdf`for PDF generation

Solution Example

**Solution Overview**: Build an integrated system including batch processing, statistical analysis, and report generation. **Implementation Code**:

from scipy.stats import f_oneway
from matplotlib.backends.backend_pdf import PdfPages

class BatchSEMAnalyzer:
    """Batch SEM image analysis system"""

    def __init__(self):
        self.results = {}

    def adaptive_preprocess(self, image):
        """Adaptive preprocessing"""
        # Contrast evaluation
        contrast = image.max() - image.min()

        if contrast < 100:  # Low contrast
            # CLAHEEnhancement
            clahe = cv2.createCLAHE(clipLimit=3.0,
                                    tileGridSize=(8, 8))
            image = clahe.apply(image)

        # Noise removal (adaptive strength)
        noise_std = np.std(np.diff(image, axis=0))
        h = 10 if noise_std < 20 else 15

        denoised = cv2.fastNlMeansDenoising(image, None, h, 7, 21)
        return denoised

    def analyze_single(self, image, sample_id):
        """Single image analysis"""
        # Preprocessing
        preprocessed = self.adaptive_preprocess(image)

        # OtsuBinarization
        threshold = filters.threshold_otsu(preprocessed)
        binary = preprocessed > threshold
        binary = morphology.remove_small_objects(binary, min_size=30)

        # Watershed
        distance = distance_transform_edt(binary)
        local_max = peak_local_max(distance, min_distance=15,
                                   labels=binary)
        markers = np.zeros_like(distance, dtype=int)
        markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)
        labels = watershed(-distance, markers, mask=binary)

        # Extract particle diameter
        diameters = []
        for region in measure.regionprops(labels):
            area = region.area
            diameter = np.sqrt(4 * area / np.pi)
            diameters.append(diameter)

        return np.array(diameters)

    def batch_analyze(self, image_dict):
        """
        Batch analysis

        Parameters:
        -----------
        image_dict : dict
            {'sample_A': [img1, img2, ...], 'sample_B': [...]}
        """
        for sample_id, images in image_dict.items():
            all_diameters = []

            for img in images:
                diameters = self.analyze_single(img, sample_id)
                all_diameters.extend(diameters)

            self.results[sample_id] = np.array(all_diameters)

    def statistical_comparison(self):
        """Statistical comparison (ANOVA)"""
        groups = list(self.results.values())
        f_stat, p_value = f_oneway(*groups)

        print("=== Analysis of Variance (ANOVA) ===")
        print(f"F-statistic: {f_stat:.3f}")
        print(f"p-value: {p_value:.4f}")

        if p_value < 0.05:
            print("Conclusion: Significant difference between samples (p < 0.05)")
        else:
            print("Conclusion: No significant difference between samples (p ≥ 0.05)")

        return f_stat, p_value

    def generate_report(self, filename='sem_report.pdf'):
        """Generate PDF report"""
        with PdfPages(filename) as pdf:
            # Page 1: Particle size distribution comparison
            fig, axes = plt.subplots(2, 2, figsize=(11, 8.5))

            for i, (sample_id, diameters) in enumerate(self.results.items()):
                ax = axes.ravel()[i]
                ax.hist(diameters, bins=20, alpha=0.7, edgecolor='black')
                ax.axvline(diameters.mean(), color='red',
                          linestyle='--',
                          label=f'Mean: {diameters.mean():.1f}')
                ax.set_xlabel('Diameter (pixels)')
                ax.set_ylabel('Frequency')
                ax.set_title(f'{sample_id} (n={len(diameters)})')
                ax.legend()
                ax.grid(True, alpha=0.3, axis='y')

            # Statistical Summary
            ax = axes.ravel()[3]
            ax.axis('off')
            summary_text = "=== Statistical Summary ===\n\n"
            for sample_id, diameters in self.results.items():
                summary_text += f"{sample_id}:\n"
                summary_text += f"  Mean: {diameters.mean():.2f}\n"
                summary_text += f"  Std: {diameters.std():.2f}\n"
                summary_text += f"  n: {len(diameters)}\n\n"

            ax.text(0.1, 0.5, summary_text, fontsize=12,
                   verticalalignment='center', family='monospace')

            plt.tight_layout()
            pdf.savefig(fig)
            plt.close()

            # Page 2: Box plot comparison
            fig, ax = plt.subplots(figsize=(11, 8.5))
            data = [self.results[key] for key in self.results.keys()]
            ax.boxplot(data, labels=list(self.results.keys()))
            ax.set_ylabel('Diameter (pixels)')
            ax.set_title('Particle Size Distribution Comparison')
            ax.grid(True, alpha=0.3, axis='y')

            pdf.savefig(fig)
            plt.close()

        print(f"Report saved to {filename}")

# Demo execution
if __name__ == "__main__":
    # Generate sample data
    np.random.seed(42)

    image_dict = {}
    for sample_id, mean_size in [('Sample_A', 25),
                                   ('Sample_B', 35),
                                   ('Sample_C', 30)]:
        images = []
        for _ in range(10):
            img = np.zeros((512, 512), dtype=np.uint8)
            num_particles = np.random.randint(30, 50)

            for _ in range(num_particles):
                x = np.random.randint(30, 482)
                y = np.random.randint(30, 482)
                r = int(np.random.normal(mean_size, 5))
                r = max(10, min(40, r))
                cv2.circle(img, (x, y), r, 200, -1)

            noise = np.random.normal(0, 25, img.shape)
            img = np.clip(img + noise, 0, 255).astype(np.uint8)
            images.append(img)

        image_dict[sample_id] = images

    # Batch analysis
    analyzer = BatchSEMAnalyzer()
    analyzer.batch_analyze(image_dict)

    # Statistical comparison
    analyzer.statistical_comparison()

    # Generate report
    analyzer.generate_report('sem_comparison_report.pdf')

    print("\n=== Statistics by Sample ===")
    for sample_id, diameters in analyzer.results.items():
        print(f"{sample_id}:")
        print(f"  Number of particles: {len(diameters)}")
        print(f"  Mean: {diameters.mean():.2f} ± {diameters.std():.2f}")

**Result Example**:

=== Analysis of Variance (ANOVA) ===
F-statistic: 124.567
p-value: 0.0001
Conclusion: Significant difference between samples (p < 0.05)

=== Statistics by Sample ===
Sample_A:
  Number of particles: 423
  Mean: 25.12 ± 4.89
Sample_B:
  Number of particles: 398
  Mean: 35.34 ± 5.23
Sample_C:
  Number of particles: 415
  Mean: 30.05 ± 4.76

Report saved to sem_comparison_report.pdf

**Detailed Explanation**: 1. **Adaptive preprocessing**: Evaluate contrast and noise level of each image and automatically adjust parameters 2. **Statistical testing**: Quantitatively evaluate particle diameter differences among 3 groups using ANOVA 3. **PDF output**: Automatically generate multi-page report (directly usable for papers and reports) **Additional Considerations**: - Multiple comparison using Tukey HSD test (which pairs have significant differences) - Shape comparison of particle size distribution (skewness, kurtosis) - Image quality evaluation using machine learning (automatic exclusion of poor images)

References

Bradski, G., & Kaehler, A. (2008). "Learning OpenCV: Computer Vision with the OpenCV Library." O'Reilly Media. ISBN: 978-0596516130
van der Walt, S. et al. (2014). "scikit-image: image processing in Python." PeerJ, 2, e453. DOI: 10.7717/peerj.453
Beucher, S., & Meyer, F. (1993). "The morphological approach to segmentation: the watershed transformation." Mathematical Morphology in Image Processing, 433-481.
Simonyan, K., & Zisserman, A. (2015). "Very Deep Convolutional Networks for Large-Scale Image Recognition." ICLR 2015. arXiv: 1409.1556
OpenCV Documentation: Image Processing. URL: https://docs.opencv.org/4.x/d2/d96/tutorial_py_table_of_contents_imgproc.html
EMPIAR (Electron Microscopy Public Image Archive). URL: https://www.ebi.ac.uk/empiar/
NanoMine Database. URL: https://materialsmine.org/

Navigation

Author Information

Author: AI Terakoya Content Team Created: 2025-10-17 Updated: 2025-10-19 Version: 1.1

Update History: - 2025-10-19: v1.1 Added data licensing, practical pitfalls, and skills checklist - 2025-10-17: v1.0 Initial release

Feedback: - GitHub Issues: [Repository URL]/issues - Email: yusuke.hashimoto.b8@tohoku.ac.jp

License: Creative Commons BY 4.0

Continue learning in the next chapter!

Chapter 3: Image Data Analysis

Learning Objectives

3.1 Characteristics of Image Data and Preprocessing Strategy

Characteristics of SEM/TEM Images

Typical Workflow for Image Analysis

3.2 Data Licensing and Reproducibility

Image Data Repositories and Licensing

Major Image Databases

Notes on Data Usage

Best Practices for Code Reproducibility

Recording Environment Information

Parameter Documentation

Recording Image Calibration

3.2 Image Preprocessing

Noise Removal

Contrast Adjustment

3.3 Particle Detection (Watershed Method)

Binarization and Distance Transform

Particle Separation using Watershed Method

3.4 Particle Size Distribution Analysis

Extraction of Particle Features

Particle Size Distribution Fitting (Log-Normal Distribution)

3.5 Image Classification using Deep Learning

Material Image Classification using Transfer Learning (VGG16)

3.6 Integrated Image Analysis Pipeline

Building an Automated Analysis System

3.7 Practical Pitfalls and Solutions

Common Failure Examples and Best Practices

Failure 1: Excessive Segmentation (Over-segmentation)

Pitfall 2: Incorrect Threshold Selection

Pitfall 3: Lack of Pixel Calibration

Pitfall 4: CNN Overfitting (Biased Training Data)

Pitfall 5: Batch Effect in Batch Processing

3.8 Image Analysis Skills Checklist

Image Preprocessing Skills

Foundational Level

Applied Level

Advanced Level

Particle Detection and Segmentation Skills

Foundational Level

Applied Level

Advanced Level

Image Measurement and Calibration Skills

Foundational Level

Applied Level

Advanced Level

Deep Learning and Image Classification Skills

Foundational Level

Applied Level

Advanced Level

Integrated Skills: Overall Workflow

Foundational Level

Applied Level

Advanced Level

3.9 Comprehensive Skills Assessment

Level 1: Beginner (60%+ of foundational skills)

Level 2: Intermediate (100% foundational + 60%+ applied)

Level 3: Advanced (100% foundational + 100% applied + 60%+ advanced)

Level 4: Expert (90%+ of all items)

3.10 Action Plan Template

Current Level: _____

Target Level (in 3 months): _____

Priority Skills to Strengthen (select 3):

Specific Action Plan:

Evaluation Metrics:

3.11 Chapter Summary

What We Learned

Key Points

Next Chapter

Exercises

Problem 1 (Difficulty: easy)

Problem 2 (Difficulty: medium)

Problem 3 (Difficulty: hard)

References

Navigation

← Previous Chapter

Next Chapter

Series Index

Author Information

Disclaimer