Chapter 3: Image Data Analysis
Implement a basic pipeline for extracting particle information from SEM/TEM images. Tips for robustness against noise and overlapping are also introduced.
💡 Note: The workflow is fixed in the order: preprocessing (smoothing, binarization) → feature extraction → post-processing. The quality of annotations determines performance.
Automated Analysis of SEM/TEM Images - From Particle Detection to Deep Learning
Learning Objectives
By completing this chapter, you will be able to:
- ✅ Perform preprocessing of SEM/TEM images (noise removal, contrast adjustment)
- ✅ Implement particle detection using the Watershed method
- ✅ Quantify particle size distribution and shape parameters (circularity, aspect ratio)
- ✅ Perform material image classification using CNN (transfer learning)
- ✅ Build an image analysis pipeline using OpenCV and scikit-image
Reading time: 30-35minutes Code examples: 13 Exercises: 3
3.1 Characteristics of Image Data and Preprocessing Strategy
Characteristics of SEM/TEM Images
Electron microscopy images are powerful tools for visualizing nano- to micro-scale structures of materials.
| Measurement Technique | Spatial Resolution | Typical Field of View | Main Information | Image Characteristics |
|---|---|---|---|---|
| SEM | Several nm to several μm | 10μm to 1mm | Surface morphology, microstructure | Deep depth of field, shadow effects |
| TEM | Atomic level | Tens of nm to several μm | Internal structure, crystallinity | High contrast, diffraction patterns |
| STEM | Sub-nm | Tens to hundreds of nm | Atomic arrangement, elemental distribution | Atomic resolution |
Typical Workflow for Image Analysis
3.2 Data Licensing and Reproducibility
Image Data Repositories and Licensing
Utilization of microscopy image databases is essential for algorithm development and validation.
Major Image Databases
| Database | Content | License | Access | Citation Requirements |
|---|---|---|---|---|
| EMPIAR (Electron Microscopy) | TEM/cryo-EM images | CC BY 4.0 | Free | Required |
| NanoMine | Nanocomposite SEM | CC BY 4.0 | Free | Recommended |
| Materials Data Facility | Materials image datasets | Mixed | Free | Required |
| Kaggle Datasets (Microscopy) | Various microscopy images | Mixed | Free | Verification required |
| NIST SRD | Standard reference images | Public Domain | Free | Recommended |
Notes on Data Usage
Example of Using Public Data:
"""
SEM image from NanoMine database.
Reference: NanoMine Dataset L123 - Polymer Nanocomposite
Citation: Zhao, H. et al. (2016) Computational Materials Science
License: CC BY 4.0
URL: https://materialsmine.org/nm/L123
Scale bar: 500 nm (calibration: 2.5 nm/pixel)
"""
Recording Image Metadata:
IMAGE_METADATA = {
'instrument': 'FEI Quanta 200',
'accelerating_voltage': '20 kV',
'magnification': '10000x',
'working_distance': '10 mm',
'pixel_size': 2.5, # nm/pixel
'scale_bar': 500, # nm
'detector': 'SE detector',
'acquisition_date': '2025-10-15'
}
# Save metadata to JSON (ensuring reproducibility)
import json
with open('image_metadata.json', 'w') as f:
json.dump(IMAGE_METADATA, f, indent=2)
Best Practices for Code Reproducibility
Recording Environment Information
# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0
# - tensorflow>=2.13.0, <2.16.0
"""
Example: Recording Environment Information
Purpose: Demonstrate core concepts and implementation patterns
Target: Advanced
Execution time: ~5 seconds
Dependencies: None
"""
import sys
import cv2
import numpy as np
from skimage import __version__ as skimage_version
import tensorflow as tf
print("=== Image Analysis Environment ===")
print(f"Python: {sys.version}")
print(f"OpenCV: {cv2.__version__}")
print(f"NumPy: {np.__version__}")
print(f"scikit-image: {skimage_version}")
print(f"TensorFlow: {tf.__version__}")
# Recommended Versions (as of October 2025):
# - Python: 3.10 or higher
# - OpenCV: 4.8 or higher
# - NumPy: 1.24 or higher
# - scikit-image: 0.21 or higher
# - TensorFlow: 2.13 or higher (GPU version recommended)
Parameter Documentation
Bad Example(not reproducible):
denoised = cv2.fastNlMeansDenoising(image, None, 10, 7, 21) # Why these values?
Good Example(reproducible):
# Non-Local MeansParameter Settings
NLM_H = 10 # Filter strength (corresponding to noise level, typical SEM values: 5-15)
NLM_TEMPLATE_WINDOW = 7 # Template window size (odd number, recommended: 7)
NLM_SEARCH_WINDOW = 21 # Search window size (odd number, recommended: 21)
NLM_DESCRIPTION = """
h: Adjust according to noise level. Low noise: 5-7, high noise: 10-15
templateWindowSize: 7 is standard (Smaller is faster but leaves residual noise)
searchWindowSize: 21 is standard (Larger is higher quality but slower)
"""
denoised = cv2.fastNlMeansDenoising(
image, None, NLM_H, NLM_TEMPLATE_WINDOW, NLM_SEARCH_WINDOW
)
Recording Image Calibration
# Pixel-to-physical size conversion parameters
CALIBRATION_PARAMS = {
'pixel_size_nm': 2.5, # nm/pixel(Calibrated from scale bar)
'scale_bar_length_nm': 500, # Physical size of scale bar
'scale_bar_pixels': 200, # Number of pixels in scale bar
'calibration_date': '2025-10-15',
'calibration_method': 'Scale bar measurement',
'uncertainty': 0.1 # nm/pixel(Calibration uncertainty)
}
# Convert pixels to physical size
def pixels_to_nm(pixels, calib_params=CALIBRATION_PARAMS):
"""Convert pixel count to physical size (nm)"""
return pixels * calib_params['pixel_size_nm']
# Usage Example
diameter_pixels = 50
diameter_nm = pixels_to_nm(diameter_pixels)
print(f"Particle diameter: {diameter_nm:.1f} ± {CALIBRATION_PARAMS['uncertainty']*diameter_pixels:.1f} nm")
3.2 Image Preprocessing
Noise Removal
Code examples1: Comparison of Various Noise Removal Filters
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0
# - scipy>=1.11.0
import numpy as np
import matplotlib.pyplot as plt
import cv2
from skimage import filters, io
from scipy import ndimage
# Fixed random seed (ensuring reproducibility)
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
# Generate synthetic SEM image (simulating particles)
def generate_synthetic_sem(size=512, num_particles=30):
"""Generate synthetic SEM image"""
image = np.zeros((size, size), dtype=np.float32)
# Random particle placement
for _ in range(num_particles):
x = np.random.randint(50, size - 50)
y = np.random.randint(50, size - 50)
radius = np.random.randint(15, 35)
# Circular particles
Y, X = np.ogrid[:size, :size]
mask = (X - x)**2 + (Y - y)**2 <= radius**2
image[mask] = 200
# Add Gaussian noise
noise = np.random.normal(0, 25, image.shape)
noisy_image = np.clip(image + noise, 0, 255).astype(np.uint8)
return noisy_image
# Generate image
noisy_image = generate_synthetic_sem()
# Various noise removal filters
gaussian_blur = cv2.GaussianBlur(noisy_image, (5, 5), 1.0)
median_filter = cv2.medianBlur(noisy_image, 5)
bilateral_filter = cv2.bilateralFilter(noisy_image, 9, 75, 75)
nlm_filter = cv2.fastNlMeansDenoising(noisy_image, None, 10, 7, 21)
# Visualization
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes[0, 0].imshow(noisy_image, cmap='gray')
axes[0, 0].set_title('Noisy SEM Image')
axes[0, 0].axis('off')
axes[0, 1].imshow(gaussian_blur, cmap='gray')
axes[0, 1].set_title('Gaussian Blur')
axes[0, 1].axis('off')
axes[0, 2].imshow(median_filter, cmap='gray')
axes[0, 2].set_title('Median Filter')
axes[0, 2].axis('off')
axes[1, 0].imshow(bilateral_filter, cmap='gray')
axes[1, 0].set_title('Bilateral Filter')
axes[1, 0].axis('off')
axes[1, 1].imshow(nlm_filter, cmap='gray')
axes[1, 1].set_title('Non-Local Means')
axes[1, 1].axis('off')
# Noise Level Comparison
axes[1, 2].bar(['Original', 'Gaussian', 'Median', 'Bilateral', 'NLM'],
[np.std(noisy_image),
np.std(gaussian_blur),
np.std(median_filter),
np.std(bilateral_filter),
np.std(nlm_filter)])
axes[1, 2].set_ylabel('Noise Level (std)')
axes[1, 2].set_title('Denoising Performance')
axes[1, 2].grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
print("=== Noise Level (Standard Deviation) ===")
print(f"Original image: {np.std(noisy_image):.2f}")
print(f"Gaussian: {np.std(gaussian_blur):.2f}")
print(f"Median: {np.std(median_filter):.2f}")
print(f"Bilateral: {np.std(bilateral_filter):.2f}")
print(f"NLM: {np.std(nlm_filter):.2f}")
Filter Selection Guidelines: - Gaussian: Fast, smooth edges → General preprocessing - Median: Edge-preserving, robust to salt & pepper noise - Bilateral: Excellent edge preservation, moderate computational cost - Non-Local Means: Highest quality, high computational cost
Contrast Adjustment
Code examples2: Histogram Equalization and CLAHE
from skimage import exposure
# Contrast Adjustment
hist_eq = exposure.equalize_hist(noisy_image)
clahe = exposure.equalize_adapthist(noisy_image, clip_limit=0.03)
# Histogram computation
hist_original = np.histogram(noisy_image, bins=256, range=(0, 256))[0]
hist_eq_vals = np.histogram(
(hist_eq * 255).astype(np.uint8), bins=256, range=(0, 256))[0]
hist_clahe_vals = np.histogram(
(clahe * 255).astype(np.uint8), bins=256, range=(0, 256))[0]
# Visualization
fig = plt.figure(figsize=(16, 10))
# Images
ax1 = plt.subplot(2, 3, 1)
ax1.imshow(noisy_image, cmap='gray')
ax1.set_title('Original')
ax1.axis('off')
ax2 = plt.subplot(2, 3, 2)
ax2.imshow(hist_eq, cmap='gray')
ax2.set_title('Histogram Equalization')
ax2.axis('off')
ax3 = plt.subplot(2, 3, 3)
ax3.imshow(clahe, cmap='gray')
ax3.set_title('CLAHE (Adaptive)')
ax3.axis('off')
# Histogram
ax4 = plt.subplot(2, 3, 4)
ax4.hist(noisy_image.ravel(), bins=256, range=(0, 256), alpha=0.7)
ax4.set_xlabel('Pixel Value')
ax4.set_ylabel('Frequency')
ax4.set_title('Original Histogram')
ax4.grid(True, alpha=0.3)
ax5 = plt.subplot(2, 3, 5)
ax5.hist((hist_eq * 255).astype(np.uint8).ravel(),
bins=256, range=(0, 256), alpha=0.7, color='orange')
ax5.set_xlabel('Pixel Value')
ax5.set_ylabel('Frequency')
ax5.set_title('Histogram Eq. Histogram')
ax5.grid(True, alpha=0.3)
ax6 = plt.subplot(2, 3, 6)
ax6.hist((clahe * 255).astype(np.uint8).ravel(),
bins=256, range=(0, 256), alpha=0.7, color='green')
ax6.set_xlabel('Pixel Value')
ax6.set_ylabel('Frequency')
ax6.set_title('CLAHE Histogram')
ax6.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("=== Contrast Metrics ===")
print(f"Original image contrast: {noisy_image.max() - noisy_image.min()}")
print(f"Histogram Eq.: {(hist_eq * 255).max() - (hist_eq * 255).min():.1f}")
print(f"CLAHE: {(clahe * 255).max() - (clahe * 255).min():.1f}")
CLAHE(Contrast Limited Adaptive Histogram Equalization)Advantages: - Local contrast enhancement - Suppress excessive enhancement (clip_limit parameter) - Details visible in both dark and bright areas of SEM images
3.3 Particle Detection (Watershed Method)
Binarization and Distance Transform
Code examples3: Otsu method for automaticBinarization
from skimage import morphology, measure
from scipy.ndimage import distance_transform_edt
# Use image after noise removal
denoised = cv2.fastNlMeansDenoising(noisy_image, None, 10, 7, 21)
# Binarization using Otsu method
threshold = filters.threshold_otsu(denoised)
binary = denoised > threshold
# Morphological operations (small noise removal)
binary_cleaned = morphology.remove_small_objects(binary, min_size=50)
binary_cleaned = morphology.remove_small_holes(binary_cleaned, area_threshold=50)
# Distance transform
distance = distance_transform_edt(binary_cleaned)
# Visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
axes[0, 0].imshow(denoised, cmap='gray')
axes[0, 0].set_title('Denoised Image')
axes[0, 0].axis('off')
axes[0, 1].imshow(binary, cmap='gray')
axes[0, 1].set_title(f'Binary (Otsu threshold={threshold:.1f})')
axes[0, 1].axis('off')
axes[1, 0].imshow(binary_cleaned, cmap='gray')
axes[1, 0].set_title('After Morphology')
axes[1, 0].axis('off')
axes[1, 1].imshow(distance, cmap='jet')
axes[1, 1].set_title('Distance Transform')
axes[1, 1].axis('off')
axes[1, 1].colorbar = plt.colorbar(axes[1, 1].imshow(distance, cmap='jet'),
ax=axes[1, 1])
plt.tight_layout()
plt.show()
print(f"=== Binarization Results ===")
print(f"Otsu threshold: {threshold:.1f}")
print(f"White pixel percentage: {binary_cleaned.sum() / binary_cleaned.size * 100:.1f}%")
Particle Separation using Watershed Method
Code examples4: Watershed Segmentation
from skimage.feature import peak_local_max
from skimage.segmentation import watershed
# Detect local maxima (estimate particle centers)
local_max = peak_local_max(
distance,
min_distance=20,
threshold_abs=5,
labels=binary_cleaned
)
# Create markers
markers = np.zeros_like(distance, dtype=int)
markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)
# Execute Watershed
labels = watershed(-distance, markers, mask=binary_cleaned)
# Visualization
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
# Display markers
axes[0].imshow(denoised, cmap='gray')
axes[0].plot(local_max[:, 1], local_max[:, 0], 'r+',
markersize=12, markeredgewidth=2)
axes[0].set_title(f'Detected Centers ({len(local_max)} particles)')
axes[0].axis('off')
# Watershed labels
axes[1].imshow(labels, cmap='nipy_spectral')
axes[1].set_title('Watershed Segmentation')
axes[1].axis('off')
# Overlay contours
overlay = denoised.copy()
overlay_rgb = cv2.cvtColor(overlay, cv2.COLOR_GRAY2RGB)
for region in measure.regionprops(labels):
minr, minc, maxr, maxc = region.bbox
cv2.rectangle(overlay_rgb, (minc, minr), (maxc, maxr),
(255, 0, 0), 2)
axes[2].imshow(overlay_rgb)
axes[2].set_title('Detected Particles')
axes[2].axis('off')
plt.tight_layout()
plt.show()
print(f"=== Watershed Results ===")
print(f"Number of detected particles: {len(local_max)}")
print(f"Number of labels: {labels.max()}")
3.4 Particle Size Distribution Analysis
Extraction of Particle Features
Code examples5: Calculation of Particle Diameter and Shape Parameters
# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0
"""
Example: Code examples5: Calculation of Particle Diameter and Shape P
Purpose: Demonstrate data visualization techniques
Target: Beginner to Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
# Feature extraction for each particle
particle_data = []
for region in measure.regionprops(labels):
# Calculate equivalent circular diameter from area
area = region.area
equivalent_diameter = np.sqrt(4 * area / np.pi)
# Aspect ratio
major_axis = region.major_axis_length
minor_axis = region.minor_axis_length
aspect_ratio = major_axis / (minor_axis + 1e-10)
# Circularity (4π × area / perimeter²)
perimeter = region.perimeter
circularity = 4 * np.pi * area / (perimeter ** 2 + 1e-10)
particle_data.append({
'label': region.label,
'area': area,
'diameter': equivalent_diameter,
'aspect_ratio': aspect_ratio,
'circularity': circularity,
'centroid': region.centroid
})
# DataFrameConvert to
import pandas as pd
df_particles = pd.DataFrame(particle_data)
print("=== Particle Feature Statistics ===")
print(df_particles[['diameter', 'aspect_ratio', 'circularity']].describe())
# Particle size distribution plot
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
# Histogram
axes[0, 0].hist(df_particles['diameter'], bins=20, alpha=0.7,
edgecolor='black')
axes[0, 0].set_xlabel('Diameter (pixels)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Particle Size Distribution')
axes[0, 0].axvline(df_particles['diameter'].mean(), color='red',
linestyle='--', label=f'Mean: {df_particles["diameter"].mean():.1f}')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3, axis='y')
# Cumulative distribution
sorted_diameters = np.sort(df_particles['diameter'])
cumulative = np.arange(1, len(sorted_diameters) + 1) / len(sorted_diameters) * 100
axes[0, 1].plot(sorted_diameters, cumulative, linewidth=2)
axes[0, 1].set_xlabel('Diameter (pixels)')
axes[0, 1].set_ylabel('Cumulative Percentage (%)')
axes[0, 1].set_title('Cumulative Size Distribution')
axes[0, 1].axhline(50, color='red', linestyle='--',
label=f'D50: {np.median(df_particles["diameter"]):.1f}')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Scatter plot (diameter vs aspect ratio)
axes[1, 0].scatter(df_particles['diameter'],
df_particles['aspect_ratio'],
alpha=0.6, s=50)
axes[1, 0].set_xlabel('Diameter (pixels)')
axes[1, 0].set_ylabel('Aspect Ratio')
axes[1, 0].set_title('Diameter vs Aspect Ratio')
axes[1, 0].grid(True, alpha=0.3)
# Scatter plot (diameter vs circularity)
axes[1, 1].scatter(df_particles['diameter'],
df_particles['circularity'],
alpha=0.6, s=50, color='green')
axes[1, 1].set_xlabel('Diameter (pixels)')
axes[1, 1].set_ylabel('Circularity')
axes[1, 1].set_title('Diameter vs Circularity')
axes[1, 1].axhline(0.8, color='red', linestyle='--',
label='Spherical threshold')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Particle diameter statistics
print("\n=== Particle Diameter Statistics ===")
print(f"Mean diameter: {df_particles['diameter'].mean():.2f} pixels")
print(f"Median(D50): {df_particles['diameter'].median():.2f} pixels")
print(f"Standard deviation: {df_particles['diameter'].std():.2f} pixels")
print(f"Minimum diameter: {df_particles['diameter'].min():.2f} pixels")
print(f"Maximum diameter: {df_particles['diameter'].max():.2f} pixels")
Particle Size Distribution Fitting (Log-Normal Distribution)
Code examples6: Log-Normal Distribution Fitting
from scipy.stats import lognorm
# Log-normal distribution fitting
diameters = df_particles['diameter'].values
shape, loc, scale = lognorm.fit(diameters, floc=0)
# Fitting results
x = np.linspace(diameters.min(), diameters.max(), 200)
pdf_fitted = lognorm.pdf(x, shape, loc, scale)
# Visualization
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.hist(diameters, bins=20, density=True, alpha=0.6,
label='Observed', edgecolor='black')
plt.plot(x, pdf_fitted, 'r-', linewidth=2,
label=f'Log-normal fit (σ={shape:.2f})')
plt.xlabel('Diameter (pixels)')
plt.ylabel('Probability Density')
plt.title('Particle Size Distribution Fitting')
plt.legend()
plt.grid(True, alpha=0.3)
# Q-Q Plot (check goodness of fit)
plt.subplot(1, 2, 2)
theoretical_quantiles = lognorm.ppf(np.linspace(0.01, 0.99, 100),
shape, loc, scale)
observed_quantiles = np.percentile(diameters, np.linspace(1, 99, 100))
plt.scatter(theoretical_quantiles, observed_quantiles, alpha=0.6)
plt.plot([diameters.min(), diameters.max()],
[diameters.min(), diameters.max()],
'r--', linewidth=2, label='Perfect fit')
plt.xlabel('Theoretical Quantiles')
plt.ylabel('Observed Quantiles')
plt.title('Q-Q Plot (Log-normal)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("=== Log-Normal Distribution Parameters ===")
print(f"Shape (σ): {shape:.3f}")
print(f"Scale (median): {scale:.2f} pixels")
3.5 Image Classification using Deep Learning
Material Image Classification using Transfer Learning (VGG16)
Code examples7: Material Phase Classification using CNN
# Requirements:
# - Python 3.9+
# - tensorflow>=2.13.0, <2.16.0
"""
Example: Code examples7: Material Phase Classification using CNN
Purpose: Demonstrate data visualization techniques
Target: Advanced
Execution time: 30-60 seconds
Dependencies: None
"""
# TensorFlow/KerasImport
try:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
TENSORFLOW_AVAILABLE = True
# TensorFlowRecord versions (reproducibility)
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
# Fix random seed
RANDOM_SEED = 42
tf.random.set_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
except ImportError:
TENSORFLOW_AVAILABLE = False
print("TensorFlow not available. Skipping this example.")
if TENSORFLOW_AVAILABLE:
# Generate sample images (3-class classification)
def generate_material_images(num_samples=100, img_size=128):
"""
Generate material image samples
Class 0: Spherical particles
Class 1: Rod-shaped particles
Class 2: Irregular particles
"""
images = []
labels = []
for class_id in range(3):
for _ in range(num_samples):
img = np.zeros((img_size, img_size), dtype=np.uint8)
if class_id == 0: # Spherical
num_particles = np.random.randint(5, 15)
for _ in range(num_particles):
x = np.random.randint(20, img_size - 20)
y = np.random.randint(20, img_size - 20)
r = np.random.randint(8, 15)
cv2.circle(img, (x, y), r, 200, -1)
elif class_id == 1: # Rod-shaped
num_rods = np.random.randint(3, 8)
for _ in range(num_rods):
x1 = np.random.randint(10, img_size - 10)
y1 = np.random.randint(10, img_size - 10)
length = np.random.randint(30, 60)
angle = np.random.rand() * 2 * np.pi
x2 = int(x1 + length * np.cos(angle))
y2 = int(y1 + length * np.sin(angle))
cv2.line(img, (x1, y1), (x2, y2), 200, 3)
else: # Irregular
num_shapes = np.random.randint(5, 12)
for _ in range(num_shapes):
pts = np.random.randint(10, img_size - 10,
size=(6, 2))
cv2.fillPoly(img, [pts], 200)
# Add noise
noise = np.random.normal(0, 20, img.shape)
img = np.clip(img + noise, 0, 255).astype(np.uint8)
# Convert to RGB
img_rgb = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
images.append(img_rgb)
labels.append(class_id)
return np.array(images), np.array(labels)
# Generate data
X_data, y_data = generate_material_images(num_samples=150)
# Split training and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X_data, y_data, test_size=0.2, random_state=RANDOM_SEED
)
# Data normalization
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
# VGG16Load model (ImageNet weights)
base_model = VGG16(
weights='imagenet',
include_top=False,
input_shape=(128, 128, 3)
)
# Freeze base model weights
base_model.trainable = False
# Add new classification layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
predictions = Dense(3, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
# Compile model
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Train model
history = model.fit(
X_train, y_train,
validation_split=0.2,
epochs=10,
batch_size=16,
verbose=0
)
# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print("=== CNN Classification Results ===")
print(f"Test accuracy: {test_acc * 100:.2f}%")
# Plot learning curves
plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.grid(True, alpha=0.3)
plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Confusion Matrix
from sklearn.metrics import confusion_matrix, classification_report
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
cm = confusion_matrix(y_test, y_pred_classes)
plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap='Blues')
plt.title('Confusion Matrix')
plt.colorbar()
tick_marks = np.arange(3)
plt.xticks(tick_marks, ['Spherical', 'Rod', 'Irregular'])
plt.yticks(tick_marks, ['Spherical', 'Rod', 'Irregular'])
# Display numerical values
for i in range(3):
for j in range(3):
plt.text(j, i, cm[i, j], ha='center', va='center',
color='white' if cm[i, j] > cm.max() / 2 else 'black')
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.tight_layout()
plt.show()
print("\n=== Classification Report ===")
print(classification_report(
y_test, y_pred_classes,
target_names=['Spherical', 'Rod', 'Irregular']
))
3.6 Integrated Image Analysis Pipeline
Building an Automated Analysis System
Code examples8: Image Analysis Pipeline Class
from dataclasses import dataclass
from typing import List, Dict
import json
@dataclass
class ParticleAnalysisResult:
"""particlesAnalysis Results"""
num_particles: int
mean_diameter: float
std_diameter: float
mean_circularity: float
particle_data: List[Dict]
class SEMImageAnalyzer:
"""SEM Image Automated Analysis System"""
def __init__(self, img_size=(512, 512)):
self.img_size = img_size
self.image = None
self.binary = None
self.labels = None
self.particles = []
def load_image(self, image: np.ndarray):
"""Load image"""
if len(image.shape) == 3:
self.image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
self.image = image
# Resize
self.image = cv2.resize(self.image, self.img_size)
def preprocess(self, denoise_strength=10):
"""Preprocessing(Noise Removal・Contrast Adjustment)"""
# Noise Removal
denoised = cv2.fastNlMeansDenoising(
self.image, None, denoise_strength, 7, 21
)
# CLAHE
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
enhanced = clahe.apply(denoised)
self.image = enhanced
def segment_particles(self, min_size=50):
"""particlesSegmentation"""
# OtsuBinarization
threshold = filters.threshold_otsu(self.image)
binary = self.image > threshold
# Morphology
binary_cleaned = morphology.remove_small_objects(
binary, min_size=min_size
)
binary_cleaned = morphology.remove_small_holes(
binary_cleaned, area_threshold=min_size
)
# Distance transform
distance = distance_transform_edt(binary_cleaned)
# Watershed
local_max = peak_local_max(
distance,
min_distance=20,
threshold_abs=5,
labels=binary_cleaned
)
markers = np.zeros_like(distance, dtype=int)
markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)
self.labels = watershed(-distance, markers, mask=binary_cleaned)
self.binary = binary_cleaned
def extract_features(self):
"""Feature extraction"""
self.particles = []
for region in measure.regionprops(self.labels):
area = region.area
diameter = np.sqrt(4 * area / np.pi)
aspect_ratio = region.major_axis_length / \
(region.minor_axis_length + 1e-10)
circularity = 4 * np.pi * area / \
(region.perimeter ** 2 + 1e-10)
self.particles.append({
'label': region.label,
'area': area,
'diameter': diameter,
'aspect_ratio': aspect_ratio,
'circularity': circularity,
'centroid': region.centroid
})
def get_results(self) -> ParticleAnalysisResult:
"""Get results"""
df = pd.DataFrame(self.particles)
return ParticleAnalysisResult(
num_particles=len(self.particles),
mean_diameter=df['diameter'].mean(),
std_diameter=df['diameter'].std(),
mean_circularity=df['circularity'].mean(),
particle_data=self.particles
)
def visualize(self):
"""Visualize results"""
fig, axes = plt.subplots(2, 2, figsize=(14, 14))
# Original image
axes[0, 0].imshow(self.image, cmap='gray')
axes[0, 0].set_title('Preprocessed Image')
axes[0, 0].axis('off')
# Binarization
axes[0, 1].imshow(self.binary, cmap='gray')
axes[0, 1].set_title('Binary Segmentation')
axes[0, 1].axis('off')
# Watershed labels
axes[1, 0].imshow(self.labels, cmap='nipy_spectral')
axes[1, 0].set_title(f'Particles ({len(self.particles)})')
axes[1, 0].axis('off')
# Particle size distribution
df = pd.DataFrame(self.particles)
axes[1, 1].hist(df['diameter'], bins=20, alpha=0.7,
edgecolor='black')
axes[1, 1].set_xlabel('Diameter (pixels)')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Particle Size Distribution')
axes[1, 1].grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
def save_results(self, filename='analysis_results.json'):
"""Save results to JSON"""
results = self.get_results()
output = {
'num_particles': results.num_particles,
'mean_diameter': results.mean_diameter,
'std_diameter': results.std_diameter,
'mean_circularity': results.mean_circularity,
'particles': results.particle_data
}
with open(filename, 'w') as f:
json.dump(output, f, indent=2)
print(f"Results saved to {filename}")
# Usage Example
analyzer = SEMImageAnalyzer()
analyzer.load_image(noisy_image)
analyzer.preprocess(denoise_strength=10)
analyzer.segment_particles(min_size=50)
analyzer.extract_features()
results = analyzer.get_results()
print("=== Analysis Results ===")
print(f"Detected particles: {results.num_particles}")
print(f"Mean diameter: {results.mean_diameter:.2f} ± {results.std_diameter:.2f} pixels")
print(f"Mean circularity: {results.mean_circularity:.3f}")
analyzer.visualize()
analyzer.save_results('sem_analysis.json')
3.7 Practical Pitfalls and Solutions
Common Failure Examples and Best Practices
Failure 1: Excessive Segmentation (Over-segmentation)
Symptom: Single particle is divided into multiple parts
Cause: Watershed min_distance is too small, or residual noise remains
Solution:
# ❌ Bad Example: Excessive segmentation
local_max = peak_local_max(distance, min_distance=5) # Too small
# Result: Noise or minor irregularities within particles are recognized as separate particles
# ✅ Good Example: Appropriate min_distance setting
# Set based on typical particle diameter
expected_diameter_pixels = 30
min_distance = int(expected_diameter_pixels * 0.6) # About 60% of diameter
local_max = peak_local_max(
distance,
min_distance=min_distance,
threshold_abs=5, # Threshold for distance transform values
labels=binary_cleaned
)
print(f"min_distance: {min_distance} pixels")
print(f"Number of detected particles: {len(local_max)}")
Pitfall 2: Incorrect Threshold Selection
Symptom: Part of particles is missing, or background is detected as particles
Cause: Otsu method is inappropriate (non-bimodal histogram)
Solution:
# ❌ Bad Example: Apply Otsu unconditionally to all images
threshold = filters.threshold_otsu(image)
binary = image > threshold
# ✅ Good Example: Histogram verification and fallback
threshold_otsu = filters.threshold_otsu(image)
# Check histogram shape
hist, bin_edges = np.histogram(image, bins=256, range=(0, 256))
# Simple check for bimodality (two peaks)
from scipy.signal import find_peaks
peaks, _ = find_peaks(hist, prominence=100)
if len(peaks) >= 2:
# Bimodal → Otsu applicable
threshold = threshold_otsu
print(f"Otsu threshold: {threshold:.1f} (bimodal histogram)")
else:
# Unimodal → Alternative method (Mean + Standard deviation, etc.)
threshold = image.mean() + image.std()
print(f"Mean+Std threshold: {threshold:.1f} (unimodal histogram)")
print("Warning: Histogram is unimodal. Otsu method may be inappropriate")
binary = image > threshold
Pitfall 3: Lack of Pixel Calibration
Symptom: Physical size (nm, μm) is unknown, data at different magnifications cannot be compared
Cause: Scale bar information not utilized
Solution:
# ❌ Bad Example: Report in pixel units
print(f"Mean particle diameter: {mean_diameter:.1f} pixels") # Physical size unknown
# ✅ Good Example: Scale bar calibration
# Step 1: Measure scale bar length (manual or OCR)
SCALE_BAR_NM = 500 # Physical size of scale bar (nm)
SCALE_BAR_PIXELS = 200 # Number of pixels in scale bar
# Step 2: Calculate calibration coefficient
nm_per_pixel = SCALE_BAR_NM / SCALE_BAR_PIXELS
print(f"Calibration: {nm_per_pixel:.2f} nm/pixel")
# Step 3: Convert to physical size
mean_diameter_nm = mean_diameter * nm_per_pixel
std_diameter_nm = std_diameter * nm_per_pixel
print(f"Mean particle diameter: {mean_diameter_nm:.1f} ± {std_diameter_nm:.1f} nm")
# Step 4: Save calibration information to results
calibration_info = {
'scale_bar_nm': SCALE_BAR_NM,
'scale_bar_pixels': SCALE_BAR_PIXELS,
'nm_per_pixel': nm_per_pixel,
'calibration_date': '2025-10-19'
}
Pitfall 4: CNN Overfitting (Biased Training Data)
Symptom: High training accuracy but poor performance on test data
Cause: Dataset bias, insufficient data augmentation
Solution:
# ❌ Bad Example: No data augmentation, small dataset
model.fit(X_train, y_train, epochs=50, batch_size=16)
# Result: Overfitting to training data
# ✅ Good Example: Data augmentation and Early Stopping
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
# Data augmentation settings
datagen = ImageDataGenerator(
rotation_range=20, # Random rotation
width_shift_range=0.1, # Horizontal shift
height_shift_range=0.1, # Vertical shift
horizontal_flip=True, # Horizontal flip
zoom_range=0.1, # Zoom
fill_mode='nearest'
)
# Early Stopping (Stop training when validation loss does not improve)
early_stop = EarlyStopping(
monitor='val_loss',
patience=5,
restore_best_weights=True
)
# Training
history = model.fit(
datagen.flow(X_train, y_train, batch_size=16),
validation_data=(X_test, y_test),
epochs=50,
callbacks=[early_stop],
verbose=1
)
print(f"Training stopped at epoch: {len(history.history['loss'])}")
Pitfall 5: Batch Effect in Batch Processing
Symptom: Even for samples under the same conditions, particle diameter differs systematically by measurement date
Cause: Instrument drift over time, calibration deviation
Solution:
# ❌ Bad Example: No correction between batches
results = []
for image_file in image_files:
analyzer.load_image(image_file)
analyzer.preprocess()
results.append(analyzer.get_results())
# ✅ Good Example: Correction using reference material
from datetime import datetime
# Measurement of reference material (known particle diameter)
STANDARD_DIAMETER_NM = 100.0 # Known physical size
def calibrate_with_standard(standard_image, expected_diameter_nm):
"""Get calibration coefficient from reference material"""
analyzer_std = SEMImageAnalyzer()
analyzer_std.load_image(standard_image)
analyzer_std.preprocess()
analyzer_std.segment_particles()
analyzer_std.extract_features()
results_std = analyzer_std.get_results()
measured_diameter_pixels = results_std.mean_diameter
nm_per_pixel = expected_diameter_nm / measured_diameter_pixels
calibration = {
'date': datetime.now().isoformat(),
'nm_per_pixel': nm_per_pixel,
'standard_diameter_nm': expected_diameter_nm,
'measured_pixels': measured_diameter_pixels
}
return calibration
# Measure standard sample at the beginning of each batch
batch_calibrations = {}
for batch_id, batch_images in batches.items():
# Standard sample measurement
standard_image = load_standard_image(batch_id)
calib = calibrate_with_standard(standard_image, STANDARD_DIAMETER_NM)
batch_calibrations[batch_id] = calib
print(f"Batch {batch_id}: {calib['nm_per_pixel']:.3f} nm/pixel")
# Analysis of samples in batch (apply calibration coefficient)
for image_file in batch_images:
analyzer.load_image(image_file)
analyzer.preprocess()
analyzer.segment_particles()
analyzer.extract_features()
results = analyzer.get_results()
# Convert to physical size using calibration coefficient
diameter_nm = results.mean_diameter * calib['nm_per_pixel']
print(f" Sample: {diameter_nm:.1f} nm")
3.8 Image Analysis Skills Checklist
Image Preprocessing Skills
Foundational Level
- [ ] Can explain characteristics of SEM/TEM images (contrast, noise)
- [ ] Can load images with OpenCV and convert to grayscale
- [ ] Understand the difference between histogram equalization and CLAHE
- [ ] Can perform noise removal with Gaussian filter
- [ ] Can perform binarization using Otsu method
Applied Level
- [ ] Can appropriately set parameters for Non-Local Means filter
- [ ] Can diagnose poor image contrast and select appropriate preprocessing
- [ ] Can use morphological operations (opening, closing)
- [ ] Can select appropriate threshold method based on histogram shape
- [ ] Can combine multiple filters to build optimal preprocessing pipeline
Advanced Level
- [ ] Can automatically detect image quality variations in batch processing and perform adaptive preprocessing
- [ ] Can design and implement custom filters (frequency domain filters)
- [ ] Can implement deep learning denoising (DnCNN)
- [ ] Can quantitatively evaluate the impact of image preprocessing on subsequent analysis
Particle Detection and Segmentation Skills
Foundational Level
- [ ] Understand the concept of distance transform
- [ ] Can explain the basic principles of Watershed method
- [ ] Can count the number of particles
- [ ] Can calculate particle diameter (equivalent circular diameter)
- [ ] Can visualize detection results
Applied Level
- [ ] Can set Watershed min_distance parameter according to particle diameter
- [ ] Can diagnose over-detection and missed detection, and adjust parameters
- [ ] Can separate overlapping particles
- [ ] Can calculate and interpret shape parameters (circularity, aspect ratio)
- [ ] Can fit particle size distribution with statistical model (log-normal distribution)
Advanced Level
- [ ] Can apply Active Contour model to complex-shaped particles
- [ ] Can implement semantic segmentation using machine learning (U-Net)
- [ ] Can perform 3D segmentation of 3D images (X-ray CT)
- [ ] Can quantitatively evaluate segmentation accuracy (Dice coefficient, IoU)
Image Measurement and Calibration Skills
Foundational Level
- [ ] Can locate scale bar position
- [ ] Can manually convert pixel to physical size
- [ ] Can report particle diameter in nm/μm units
- [ ] Recognize the existence of measurement error
Applied Level
- [ ] Can automatically calculate calibration coefficient from scale bar
- [ ] Can analyze images at different magnifications uniformly
- [ ] Can estimate calibration uncertainty and explicitly state uncertainty in results
- [ ] Can evaluate measurement repeatability (reproducibility)
- [ ] Can perform instrument calibration using reference materials
Advanced Level
- [ ] Can correct for batch effects (periodic measurement of standard samples)
- [ ] Can propagate measurement uncertainty (ISO GUM)
- [ ] Can verify measurement value compatibility between different instruments
- [ ] Can establish traceable calibration chain
Deep Learning and Image Classification Skills
Foundational Level
- [ ] Understand basic CNN structure (convolutional layers, pooling layers)
- [ ] Can explain the concept of transfer learning
- [ ] Can split training and test data
- [ ] Can evaluate model accuracy
- [ ] Can interpret confusion matrix
Applied Level
- [ ] Can apply data augmentation
- [ ] Can prevent overfitting with Early Stopping
- [ ] Can adjust hyperparameters (learning rate, batch size)
- [ ] Can compare different architectures (VGG, ResNet, EfficientNet)
- [ ] Can visualize decision basis using Grad-CAM
Advanced Level
- [ ] Can design custom CNN architectures
- [ ] Can handle imbalanced data (class weights, SMOTE)
- [ ] Can implement multi-task learning (classification + regression)
- [ ] Can quantify model uncertainty (Bayesian Deep Learning)
- [ ] Can implement few-shot learning with limited data
Integrated Skills: Overall Workflow
Foundational Level
- [ ] Can execute workflow: Load image → Preprocessing → Segmentation → Feature Extraction
- [ ] Can summarize and report results in tables
- [ ] Can visualize using graphs (histograms, scatter plots)
Applied Level
- [ ] Can automate batch processing of multiple images
- [ ] Can design processing pipeline as classes
- [ ] Can save results in JSON/CSV format
- [ ] Can implement error handling and logging
- [ ] Can manage parameters in YAML/JSON files
Advanced Level
- [ ] Can publish analysis tool with web interface (Streamlit, Gradio)
- [ ] Can build scalable batch processing on cloud (AWS, GCP)
- [ ] Can store and search results in database (MongoDB)
- [ ] Can automate testing with CI/CD pipeline
- [ ] Can automatically generate publication-quality figures and reports
3.9 Comprehensive Skills Assessment
Please conduct self-assessment based on the following criteria.
Level 1: Beginner (60%+ of foundational skills)
- Can perform basic image preprocessing and particle detection
- Can understand existing code and adjust parameters
- Can complete analysis on simple datasets
Next Steps: - Master Applied Level techniques (Watershed, data augmentation) - Challenge more complex real data - Deepen understanding of parameter meanings
Level 2: Intermediate (100% foundational + 60%+ applied)
- Can select appropriate preprocessing and segmentation for complex image data
- Can implement image classification using CNN
- Can perform batch processing and error handling
Next Steps: - Challenge advanced techniques (custom CNN, 3D analysis) - Practice in research projects - Optimize code and improve scalability
Level 3: Advanced (100% foundational + 100% applied + 60%+ advanced)
- Can design and implement new algorithms
- Can execute research paper-level analysis
- Can provide tools to other researchers
Next Steps: - Develop original methods - Paper writing and conference presentations - Contribute to open source
Level 4: Expert (90%+ of all items)
- Can lead the field of materials image analysis
- Can develop analysis methods for new measurement techniques
- Contributing to international community
Activity Examples: - Invited talks and tutorials - Leading collaborative research - Participating in standardization activities
3.10 Action Plan Template
Current Level: _____
Target Level (in 3 months): _____
Priority Skills to Strengthen (select 3):
Specific Action Plan:
Week 1-2: - [ ] Action 1: ____ - [ ] Action 2: ______
Week 3-4: - [ ] Action 1: ____ - [ ] Action 2: ______
Week 5-8: - [ ] Action 1: ____ - [ ] Action 2: ______
Week 9-12: - [ ] Action 1: ____ - [ ] Action 2: ______
Evaluation Metrics:
- [ ] Complete analysis on custom dataset
- [ ] Present at research meeting/seminar
- [ ] Publish results on GitHub/paper
3.11 Chapter Summary
What We Learned
-
Data Licensing and Reproducibility - Utilization of image data repositories - Recording image metadata and calibration information - Best Practices for Code Reproducibility
-
Image Preprocessing - Noise Removal (Gaussian, Median, Bilateral, NLM) - Contrast Adjustment (Histogram Equalization, CLAHE) - Binarization (Otsu method)
-
Particle Detection - Segmentation using Watershed method - Distance transform and local maxima detection - Morphological operations
-
Quantitative Analysis - Particle size distribution (histogram, cumulative distribution) - Shape parameters (circularity, aspect ratio) - Log-normal distribution fitting
-
Deep Learning - Transfer learning (VGG16) - Material image classification - Performance evaluation using confusion matrix
-
Practical Pitfalls - Avoiding excessive segmentation - Importance of pixel calibration - Correction of batch effects - Solutions for CNN overfitting
Key Points
- ✅ Always record image metadata (magnification, scale bar)
- ✅ Preprocessing quality determines segmentation accuracy
- ✅ Parameter tuning is important for Watershed method (min_distance, threshold_abs)
- ✅ Report particle diameter in physical units (nm, μm)
- ✅ Regularly verify pixel calibration with reference materials
- ✅ Transfer learning enables high-accuracy classification even with limited data
- ✅ Prevent overfitting with data augmentation and Early Stopping
Next Chapter
In Chapter 4, we will learn about time-series data and integrated analysis: - Preprocessing of temperature and pressure sensor data - Moving window analysis - Anomaly detection - Dimensionality reduction using PCA - Automation using sklearn Pipeline
Chapter 4: Time-Series Data and Integrated Analysis →
Exercises
Problem 1 (Difficulty: easy)
Determine whether the following statements are true or false.
- Bilateral Filter can preserve edges better than Gaussian Filter
- CLAHE applies the same histogram equalization to the entire image
- In the Watershed method, local maxima of the distance transform are considered as particle centers
Hint
1. Operating principle of Bilateral Filter (considers both spatial distance and intensity difference) 2. CLAHE "Adaptive" meaning of 3. Watershed algorithm flow (Distance transform → markers → watershed)Solution Example
**Answer**: 1. **True** - Bilateral Filter considers intensity differences, so smoothing is suppressed near edges 2. **False** - CLAHE divides the image into small regions (tiles) and performs adaptive histogram equalization for each region 3. **True** - Local maxima of the distance transform correspond to particle centers and are used as markers for Watershed **Explanation**: In image preprocessing, it is important to select appropriate methods according to the processing purpose (noise removal vs edge preservation). Watershed is a powerful method that can separate even touching particles by utilizing distance transform.Problem 2 (Difficulty: medium)
Perform particle detection and particle size distribution analysis on the following SEM image data.
# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
"""
Example: Perform particle detection and particle size distribution an
Purpose: Demonstrate core concepts and implementation patterns
Target: Beginner to Intermediate
Execution time: ~5 seconds
Dependencies: None
"""
import numpy as np
# Generate sample SEM image
def generate_sample_sem():
np.random.seed(100)
img = np.zeros((512, 512), dtype=np.uint8)
for _ in range(40):
x = np.random.randint(30, 482)
y = np.random.randint(30, 482)
r = np.random.randint(10, 25)
cv2.circle(img, (x, y), r, 200, -1)
noise = np.random.normal(0, 30, img.shape)
return np.clip(img + noise, 0, 255).astype(np.uint8)
sample_image = generate_sample_sem()
Requirements: 1. Noise removal using Non-Local Means 2. Binarization using Otsu method 3. Particle detection using Watershed method 4. Plot particle size distribution as histogram 5. Output mean particle diameter and standard deviation
Hint
**Processing Flow**: 1. Noise removal using `cv2.fastNlMeansDenoising` 2. Calculate threshold using `filters.threshold_otsu` → Binarization 3. `distance_transform_edt` + `peak_local_max` + `watershed` 4. Feature extraction for particles using `measure.regionprops` 5. Visualization using `matplotlib.pyplot.hist`Solution Example
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - opencv-python>=4.8.0
"""
Example: Requirements:
1. Noise removal using Non-Local Means
2. Bina
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import cv2
import matplotlib.pyplot as plt
from skimage import filters, morphology, measure
from skimage.feature import peak_local_max
from skimage.segmentation import watershed
from scipy.ndimage import distance_transform_edt
# Generate sample image
def generate_sample_sem():
np.random.seed(100)
img = np.zeros((512, 512), dtype=np.uint8)
for _ in range(40):
x = np.random.randint(30, 482)
y = np.random.randint(30, 482)
r = np.random.randint(10, 25)
cv2.circle(img, (x, y), r, 200, -1)
noise = np.random.normal(0, 30, img.shape)
return np.clip(img + noise, 0, 255).astype(np.uint8)
sample_image = generate_sample_sem()
# Step1: Noise Removal
denoised = cv2.fastNlMeansDenoising(sample_image, None, 10, 7, 21)
# Step2: OtsuBinarization
threshold = filters.threshold_otsu(denoised)
binary = denoised > threshold
binary = morphology.remove_small_objects(binary, min_size=30)
# Step 3: Watershed
distance = distance_transform_edt(binary)
local_max = peak_local_max(distance, min_distance=15,
threshold_abs=3, labels=binary)
markers = np.zeros_like(distance, dtype=int)
markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)
labels = watershed(-distance, markers, mask=binary)
# Step4: Calculate particle diameter
diameters = []
for region in measure.regionprops(labels):
area = region.area
diameter = np.sqrt(4 * area / np.pi)
diameters.append(diameter)
diameters = np.array(diameters)
# Step5: Statistics and visualization
print("=== Particle Diameter Statistics ===")
print(f"Detected particles: {len(diameters)}")
print(f"MeanParticle diameter: {diameters.mean():.2f} pixels")
print(f"Standard deviation: {diameters.std():.2f} pixels")
fig, axes = plt.subplots(2, 2, figsize=(14, 14))
axes[0, 0].imshow(sample_image, cmap='gray')
axes[0, 0].set_title('Original Image')
axes[0, 0].axis('off')
axes[0, 1].imshow(binary, cmap='gray')
axes[0, 1].set_title(f'Binary (Otsu={threshold:.1f})')
axes[0, 1].axis('off')
axes[1, 0].imshow(labels, cmap='nipy_spectral')
axes[1, 0].set_title(f'Detected Particles ({len(diameters)})')
axes[1, 0].axis('off')
axes[1, 1].hist(diameters, bins=15, alpha=0.7, edgecolor='black')
axes[1, 1].axvline(diameters.mean(), color='red', linestyle='--',
label=f'Mean: {diameters.mean():.1f}')
axes[1, 1].set_xlabel('Diameter (pixels)')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Particle Size Distribution')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
**Output Example**:
=== Particle Diameter Statistics ===
Detected particles: 38
MeanParticle diameter: 30.45 pixels
Standard deviation: 8.23 pixels
**Explanation**:
In this example, setting min_distance=15 for Watershed suppresses over-detection of adjacent particles. The variability in particle diameter (standard deviation) is due to the synthesis process. In real data, it reflects measurement conditions and material inhomogeneity.
Problem 3 (Difficulty: hard)
Build a system to automatically process multiple SEM images and perform statistical comparison of particle size distributions.
Background: For material samples A, B, and C prepared under different synthesis conditions, 10 SEM images were taken for each. It is necessary to automatically analyze the particle size distribution of each sample and statistically compare them.
Task: 1. Automatically analyze 30 images through batch processing 2. Visualize particle size distribution for each sample 3. Statistical significance testing using analysis of variance (ANOVA) 4. Output results as PDF report
Constraints: - Measurement conditions (magnification, exposure) may differ for each image - Some images have poor contrast - Processing time: within 5 seconds/image
Hint
**Design Guidelines**: 1. Extend `SEMImageAnalyzer` class 2. Adaptive preprocessing (contrast determination via histogram analysis) 3. Save results in structured format (JSON/CSV) 4. `scipy.stats.f_oneway`for ANOVA 5. `matplotlib.backends.backend_pdf`for PDF generationSolution Example
**Solution Overview**: Build an integrated system including batch processing, statistical analysis, and report generation. **Implementation Code**:from scipy.stats import f_oneway
from matplotlib.backends.backend_pdf import PdfPages
class BatchSEMAnalyzer:
"""Batch SEM image analysis system"""
def __init__(self):
self.results = {}
def adaptive_preprocess(self, image):
"""Adaptive preprocessing"""
# Contrast evaluation
contrast = image.max() - image.min()
if contrast < 100: # Low contrast
# CLAHEEnhancement
clahe = cv2.createCLAHE(clipLimit=3.0,
tileGridSize=(8, 8))
image = clahe.apply(image)
# Noise removal (adaptive strength)
noise_std = np.std(np.diff(image, axis=0))
h = 10 if noise_std < 20 else 15
denoised = cv2.fastNlMeansDenoising(image, None, h, 7, 21)
return denoised
def analyze_single(self, image, sample_id):
"""Single image analysis"""
# Preprocessing
preprocessed = self.adaptive_preprocess(image)
# OtsuBinarization
threshold = filters.threshold_otsu(preprocessed)
binary = preprocessed > threshold
binary = morphology.remove_small_objects(binary, min_size=30)
# Watershed
distance = distance_transform_edt(binary)
local_max = peak_local_max(distance, min_distance=15,
labels=binary)
markers = np.zeros_like(distance, dtype=int)
markers[tuple(local_max.T)] = np.arange(1, len(local_max) + 1)
labels = watershed(-distance, markers, mask=binary)
# Extract particle diameter
diameters = []
for region in measure.regionprops(labels):
area = region.area
diameter = np.sqrt(4 * area / np.pi)
diameters.append(diameter)
return np.array(diameters)
def batch_analyze(self, image_dict):
"""
Batch analysis
Parameters:
-----------
image_dict : dict
{'sample_A': [img1, img2, ...], 'sample_B': [...]}
"""
for sample_id, images in image_dict.items():
all_diameters = []
for img in images:
diameters = self.analyze_single(img, sample_id)
all_diameters.extend(diameters)
self.results[sample_id] = np.array(all_diameters)
def statistical_comparison(self):
"""Statistical comparison (ANOVA)"""
groups = list(self.results.values())
f_stat, p_value = f_oneway(*groups)
print("=== Analysis of Variance (ANOVA) ===")
print(f"F-statistic: {f_stat:.3f}")
print(f"p-value: {p_value:.4f}")
if p_value < 0.05:
print("Conclusion: Significant difference between samples (p < 0.05)")
else:
print("Conclusion: No significant difference between samples (p ≥ 0.05)")
return f_stat, p_value
def generate_report(self, filename='sem_report.pdf'):
"""Generate PDF report"""
with PdfPages(filename) as pdf:
# Page 1: Particle size distribution comparison
fig, axes = plt.subplots(2, 2, figsize=(11, 8.5))
for i, (sample_id, diameters) in enumerate(self.results.items()):
ax = axes.ravel()[i]
ax.hist(diameters, bins=20, alpha=0.7, edgecolor='black')
ax.axvline(diameters.mean(), color='red',
linestyle='--',
label=f'Mean: {diameters.mean():.1f}')
ax.set_xlabel('Diameter (pixels)')
ax.set_ylabel('Frequency')
ax.set_title(f'{sample_id} (n={len(diameters)})')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
# Statistical Summary
ax = axes.ravel()[3]
ax.axis('off')
summary_text = "=== Statistical Summary ===\n\n"
for sample_id, diameters in self.results.items():
summary_text += f"{sample_id}:\n"
summary_text += f" Mean: {diameters.mean():.2f}\n"
summary_text += f" Std: {diameters.std():.2f}\n"
summary_text += f" n: {len(diameters)}\n\n"
ax.text(0.1, 0.5, summary_text, fontsize=12,
verticalalignment='center', family='monospace')
plt.tight_layout()
pdf.savefig(fig)
plt.close()
# Page 2: Box plot comparison
fig, ax = plt.subplots(figsize=(11, 8.5))
data = [self.results[key] for key in self.results.keys()]
ax.boxplot(data, labels=list(self.results.keys()))
ax.set_ylabel('Diameter (pixels)')
ax.set_title('Particle Size Distribution Comparison')
ax.grid(True, alpha=0.3, axis='y')
pdf.savefig(fig)
plt.close()
print(f"Report saved to {filename}")
# Demo execution
if __name__ == "__main__":
# Generate sample data
np.random.seed(42)
image_dict = {}
for sample_id, mean_size in [('Sample_A', 25),
('Sample_B', 35),
('Sample_C', 30)]:
images = []
for _ in range(10):
img = np.zeros((512, 512), dtype=np.uint8)
num_particles = np.random.randint(30, 50)
for _ in range(num_particles):
x = np.random.randint(30, 482)
y = np.random.randint(30, 482)
r = int(np.random.normal(mean_size, 5))
r = max(10, min(40, r))
cv2.circle(img, (x, y), r, 200, -1)
noise = np.random.normal(0, 25, img.shape)
img = np.clip(img + noise, 0, 255).astype(np.uint8)
images.append(img)
image_dict[sample_id] = images
# Batch analysis
analyzer = BatchSEMAnalyzer()
analyzer.batch_analyze(image_dict)
# Statistical comparison
analyzer.statistical_comparison()
# Generate report
analyzer.generate_report('sem_comparison_report.pdf')
print("\n=== Statistics by Sample ===")
for sample_id, diameters in analyzer.results.items():
print(f"{sample_id}:")
print(f" Number of particles: {len(diameters)}")
print(f" Mean: {diameters.mean():.2f} ± {diameters.std():.2f}")
**Result Example**:
=== Analysis of Variance (ANOVA) ===
F-statistic: 124.567
p-value: 0.0001
Conclusion: Significant difference between samples (p < 0.05)
=== Statistics by Sample ===
Sample_A:
Number of particles: 423
Mean: 25.12 ± 4.89
Sample_B:
Number of particles: 398
Mean: 35.34 ± 5.23
Sample_C:
Number of particles: 415
Mean: 30.05 ± 4.76
Report saved to sem_comparison_report.pdf
**Detailed Explanation**:
1. **Adaptive preprocessing**: Evaluate contrast and noise level of each image and automatically adjust parameters
2. **Statistical testing**: Quantitatively evaluate particle diameter differences among 3 groups using ANOVA
3. **PDF output**: Automatically generate multi-page report (directly usable for papers and reports)
**Additional Considerations**:
- Multiple comparison using Tukey HSD test (which pairs have significant differences)
- Shape comparison of particle size distribution (skewness, kurtosis)
- Image quality evaluation using machine learning (automatic exclusion of poor images)
References
-
Bradski, G., & Kaehler, A. (2008). "Learning OpenCV: Computer Vision with the OpenCV Library." O'Reilly Media. ISBN: 978-0596516130
-
van der Walt, S. et al. (2014). "scikit-image: image processing in Python." PeerJ, 2, e453. DOI: 10.7717/peerj.453
-
Beucher, S., & Meyer, F. (1993). "The morphological approach to segmentation: the watershed transformation." Mathematical Morphology in Image Processing, 433-481.
-
Simonyan, K., & Zisserman, A. (2015). "Very Deep Convolutional Networks for Large-Scale Image Recognition." ICLR 2015. arXiv: 1409.1556
-
OpenCV Documentation: Image Processing. URL: https://docs.opencv.org/4.x/d2/d96/tutorial_py_table_of_contents_imgproc.html
-
EMPIAR (Electron Microscopy Public Image Archive). URL: https://www.ebi.ac.uk/empiar/
-
NanoMine Database. URL: https://materialsmine.org/
Navigation
← Previous Chapter
Chapter 2: Spectral Data Analysis ←
Next Chapter
Chapter 4: Time-Series Data and Integrated Analysis →
Series Index
Author Information
Author: AI Terakoya Content Team Created: 2025-10-17 Updated: 2025-10-19 Version: 1.1
Update History: - 2025-10-19: v1.1 Added data licensing, practical pitfalls, and skills checklist - 2025-10-17: v1.0 Initial release
Feedback: - GitHub Issues: [Repository URL]/issues - Email: yusuke.hashimoto.b8@tohoku.ac.jp
License: Creative Commons BY 4.0
Continue learning in the next chapter!