Materials Property Mapping Introduction Series v1.0

Visualization and exploration of high-dimensional materials space using GNN and dimensionality reduction

Series Overview

This series is a comprehensive 4-chapter educational program teaching practical skills to effectively visualize high-dimensional materials space and accelerate materials discovery by combining Graph Neural Networks (GNN) representation learning with dimensionality reduction techniques.

Materials property mapping is a technology that represents thousands to tens of thousands of materials as points in high-dimensional property space, projects them into 2D or 3D space, and visualizes them to intuitively understand material similarity, structure-property relationships, and regions to explore. By using embeddings automatically learned by GNNs, which capture structural information that cannot be captured by conventional composition-based descriptors, more intrinsic materials space mapping becomes possible.

Why This Series Is Needed

Background and Challenges: Finding optimal materials from tens of thousands of candidates in materials discovery is extremely difficult. Traditional approaches relied on human experience and intuition, but intuition does not work well in high-dimensional property spaces, leading to many promising materials being overlooked. In particular, even if compositions are similar, properties can differ significantly if crystal structures differ, and conversely, even if compositions differ, similar structures can exhibit similar properties. Appropriate visualization and mapping techniques are essential to understand such complex relationships and efficiently explore materials space.

What You Will Learn in This Series: This series provides systematic learning from the basics of materials space visualization to dimensionality reduction techniques such as PCA, t-SNE, and UMAP, materials representation learning using GNN, and construction of practical materials mapping systems that integrate both, with 70 executable Python code examples. You will master the complete end-to-end workflow from acquiring real data from Materials Project API, through GNN training, embedding extraction, dimensionality reduction, clustering, to materials recommendation systems.

Features: - ✅ Practice-oriented: 70 executable code examples, Materials Project API integration - ✅ Progressive structure: Basic visualization → Dimensionality reduction → GNN → Integrated system - ✅ Latest technologies: Combination of CGCNN, MEGNet, SchNet with UMAP, t-SNE - ✅ Interactive visualization: Exploratory data analysis using Plotly and Bokeh - ✅ Practical applications: Materials recommendation system, clustering analysis, extrapolation region detection

Total Learning Time: 90-110 minutes (including code execution and exercises)

Target Audience: - Graduate students in materials science/chemistry (master's and doctoral programs) - R&D engineers in industry (materials development, data analysis) - Computational materials scientists (experience with DFT, MD simulations) - Data scientists (aiming to apply to materials/chemistry fields)

How to Learn

Recommended Learning Order

flowchart TD A[Chapter 1: Fundamentals of Materials Space Visualization] --> B[Chapter 2: Dimensionality Reduction Methods] B --> C[Chapter 3: Materials Representation Learning with GNN] C --> D[Chapter 4: Practical - Building Integrated Systems] style A fill:#e3f2fd style B fill:#fff3e0 style C fill:#f3e5f5 style D fill:#e8f5e9

For beginners (first time learning GNN and dimensionality reduction): - Chapter 1 → Chapter 2 → Chapter 3 → Chapter 4 (all chapters recommended) - Duration: 90-110 minutes - Prerequisites: GNN Introduction Series or deep learning basics, advanced Python level

For intermediate learners (with GNN experience): - Chapter 2 → Chapter 3 → Chapter 4 - Duration: 70-90 minutes - Chapter 1 can be skipped (refer as needed)

For practical skill enhancement (implementation-focused): - Chapter 3 (GNN implementation) → Chapter 4 (integrated system) - Duration: 50-70 minutes - Refer to Chapters 1 and 2 for theory as needed

Chapter Details

Chapter 1: Fundamentals of Materials Space Visualization

Difficulty: Introductory Reading Time: 20-25 minutes Code Examples: 5

Learning Content

What is Materials Space - Dimensions of property space and challenges of high-dimensional data - Concept of representing materials as points - The curse of dimensionality and visualization limits
Preparing Materials Data - Calculating basic statistics - Property distribution histograms - Data preprocessing and cleaning
Basic Visualization with 2D Scatter Plots - Scatter plots between two properties - Pair plots (multivariate correlations) - Color coding and size mapping
Correlation Matrix Visualization - Correlation analysis using heatmaps - Identifying property pairs with strong correlations

Learning Objectives

After completing this chapter, you will be able to:

✅ Explain the concept of materials space and challenges of high-dimensional data
✅ Visualize basic statistics and data distributions
✅ Analyze relationships between properties using scatter plots and pair plots
✅ Find important property combinations from correlation matrices

Read Chapter 1 →

Chapter 2: Mapping Materials Space with Dimensionality Reduction Methods

Difficulty: Beginner to Intermediate Reading Time: 25-30 minutes Code Examples: 15

Learning Content

Principal Component Analysis (PCA) - PCA basic implementation and variance contribution analysis - Scree plot - Loading plot (biplot)
t-SNE (t-Distributed Stochastic Neighbor Embedding) - t-SNE implementation and effect of perplexity parameter - Clustering result visualization - Neighborhood preservation rate evaluation
UMAP (Uniform Manifold Approximation and Projection) - UMAP implementation and n_neighbors parameter optimization - Creating density maps - 3D visualization with 3D UMAP
Method Comparison - Performance comparison of PCA vs t-SNE vs UMAP - Quantitative evaluation using neighborhood preservation rate - Method selection according to use cases
Interactive Visualization - 3D visualization with Plotly - Interactive scatter plots with Bokeh - Visualization of dimensionality reduction process through animation

Learning Objectives

✅ Understand the principles and implementation of PCA, t-SNE, and UMAP
✅ Appropriately adjust parameters for each method
✅ Compare methods using evaluation metrics such as neighborhood preservation rate
✅ Create interactive visualizations with Plotly and Bokeh
✅ Select optimal dimensionality reduction method according to purpose

Read Chapter 2 →

Chapter 3: Materials Representation Learning with GNN

Difficulty: Intermediate to Advanced Reading Time: 25-30 minutes Code Examples: 20 (all executable)

Learning Content

Graph Representation of Materials - Conversion from crystal structure to graph - Atomic feature encoding - PyTorch Geometric data structure
Crystal Graph Convolutional Neural Network (CGCNN) - CGCNN convolution layer implementation - Complete CGCNN model construction - Training loop and Early Stopping
MEGNet (MatErials Graph Network) - MEGNet block considering global state - Complete MEGNet model - Comparison with CGCNN
SchNet - Continuous filter convolution layer - Distance embedding using Gaussian basis functions - Complete SchNet model
Embedding Visualization and Analysis - Visualizing GNN embeddings with UMAP - Comparing multiple models with t-SNE - Clustering and property analysis - Quantitative evaluation of embedding quality

Learning Objectives

✅ Understand and implement methods for representing materials as graphs
✅ Implement and compare performance of CGCNN, MEGNet, and SchNet
✅ Extract embeddings obtained from GNN
✅ Visualize embeddings with UMAP and t-SNE, and analyze clusters
✅ Evaluate embedding quality using metrics such as silhouette score

Read Chapter 3 →

Chapter 4: Practical - Materials Mapping with GNN + Dimensionality Reduction

Difficulty: Intermediate to Advanced Reading Time: 30-35 minutes Code Examples: 30 (end-to-end implementation)

Learning Content

Environment Setup and Data Collection - Materials Project API configuration - Real data acquisition and exploratory analysis
Building Graph Datasets - Optimized conversion from crystal structure to graph - Creating custom dataset classes - DataLoader construction
GNN Model Training - Improved CGCNN model - Training loop with Early Stopping - Evaluation on test data
Embedding Extraction and Dimensionality Reduction - Extracting embeddings from all data - Dimensionality reduction using PCA, UMAP, t-SNE - Comparing dimensionality reduction methods
Materials Space Analysis - Clustering and property analysis - Creating density maps - Searching for neighboring materials - Implementing materials recommendation system
Interactive Visualization - 3D UMAP with Plotly - Interactive scatter plots with Bokeh - Dashboard with Dash (optional)
Advanced Analysis and Applications - Voronoi tessellation - Visualizing property gradients - Detecting extrapolation regions - Generating comprehensive reports

Learning Objectives

✅ Acquire real data from Materials Project API
✅ Build complete GNN training pipeline
✅ Visualize trained GNN embeddings with dimensionality reduction
✅ Implement clustering and materials recommendation system
✅ Create interactive exploration systems with Plotly and Bokeh
✅ Apply to actual materials design tasks

Read Chapter 4 →

Overall Learning Outcomes

Upon completing this series, you will acquire the following skills and knowledge:

Knowledge Level (Understanding)

✅ Can explain concepts of materials space and high-dimensional data visualization
✅ Understand principles and usage of PCA, t-SNE, and UMAP
✅ Understand features and implementation of CGCNN, MEGNet, and SchNet
✅ Can explain advantages of combining GNN embeddings with dimensionality reduction

Practical Skills (Doing)

✅ Can acquire materials data from Materials Project API
✅ Can convert crystal structures to graph data
✅ Can implement and train CGCNN, MEGNet, and SchNet
✅ Can extract GNN embeddings and visualize with UMAP/t-SNE
✅ Can build clustering and materials recommendation systems
✅ Can create interactive visualizations with Plotly and Bokeh

Application Ability (Applying)

✅ Can build mapping systems for new materials datasets
✅ Can obtain materials design insights from cluster analysis
✅ Can suggest next materials to synthesize using materials recommendation system
✅ Can detect extrapolation regions and evaluate prediction reliability
✅ Can generate comprehensive analysis reports and utilize in research

Recommended Learning Patterns

Pattern 1: Complete Mastery (For Beginners)

Target: Those learning GNN and dimensionality reduction for the first time Duration: 2-3 weeks Approach:

Week 1:
- Day 1-2: Chapter 1 (Fundamentals of Materials Space Visualization)
- Day 3-5: Chapter 2 (Dimensionality Reduction Methods)
- Day 6-7: Chapter 2 exercises, method comparison

Week 2:
- Day 1-3: Chapter 3 (GNN implementation + embedding extraction)
- Day 4-7: Chapter 3 (multiple model implementation + visualization)

Week 3:
- Day 1-4: Chapter 4 (building integrated system)
- Day 5-7: Chapter 4 (advanced analysis + report generation)

Deliverables: - Materials mapping system with Materials Project data - Interactive 3D visualization (Plotly) - Materials recommendation system implementation - GitHub repository (all code + README)

Pattern 2: Intensive (For Experienced Learners)

Target: Those with GNN and machine learning basics Duration: 1 week Approach:

Day 1: Chapter 2 (dimensionality reduction method implementation)
Day 2-3: Chapter 3 (GNN implementation)
Day 4-6: Chapter 4 (building integrated system)
Day 7: Application to original data

Deliverables: - Performance comparison of multiple GNN models (CGCNN, MEGNet, SchNet) - Integrated materials mapping system - Interactive dashboard

FAQ (Frequently Asked Questions)

Q1: Can I understand without completing the GNN Introduction Series?

A: Basic GNN knowledge is a prerequisite. Chapters 3 and 4 assume GNN implementation experience. Strongly recommend completing Chapters 1-3 of the GNN Introduction Series first. Minimum required skills: PyTorch Geometric basics, message passing concept, graph data handling.

Q2: Can I learn without a Materials Project API key?

A: Yes, you can learn without an API key. Chapter 4 provides dummy data generation code, so all code examples can be executed without an API key. However, if you want to try with real data, create a free account at Materials Project and obtain an API key (takes about 5 minutes).

Q3: How much computing resources (GPU) are required?

A: GPU is recommended for training but CPU is also possible:

CPU only: - Possible: Training with dummy data (1000 materials) - Training time: 10-30 minutes (CGCNN) - Google Colab free tier (CPU) is sufficient

GPU recommended: - Real data training (10,000+ materials) - Recommended GPU: NVIDIA RTX 3060 (12GB VRAM) or higher - Training time: 5-15 minutes (CGCNN) - Google Colab Pro (GPU) is convenient

Google Colab free tier is sufficient for this series.

Q4: Should I use UMAP or t-SNE?

A: UMAP is recommended, but it depends on the purpose:

UMAP advantages: - Large-scale data (10,000+ points) - Limited computation time - Want to preserve global structure - 3D visualization needed

t-SNE advantages: - Small-scale data (1,000 or fewer points) - Emphasizing cluster structure is important - Want to use commonly used method in papers

Best practice: Try both and compare, select according to purpose.

Q5: How should I interpret clustering results?

A: Start by comparing average property values for each cluster:

Calculate average property values for each cluster (see Chapter 4 code example 16)
Identify characteristic clusters (e.g., high band gap cluster)
Investigate materials within clusters in detail
Confirm structural similarity (same crystal system, similar composition, etc.)
Determine direction for new materials exploration

Important to note: Clusters are just statistical groups and physical meaning requires separate validation.

Q6: Why is detecting extrapolation regions important?

A: To evaluate prediction reliability:

Within training data range: GNN predictions are highly accurate (MAE < 0.1 eV)
Extrapolation region: Prediction accuracy may decrease (MAE > 0.3 eV)

Detecting extrapolation regions allows: - Distinguish reliable predictions from uncertain ones - Identify materials requiring additional experiments - Utilize in active learning for next sample selection

Implementation method explained in Chapter 4 code example 29.

Q7: Can I become a materials mapping expert just from this series?

A: This series targets "bridging from intermediate to advanced." To reach expert level:

Build foundation with this series (2-3 weeks)
Execute projects with original data (1-3 months) - Build mapping system with your research data - Experiment with new GNN architectures
Read papers and track latest technologies (ongoing) - Latest papers on GNN + UMAP - Trends in Materials Informatics field
Conference presentations and paper writing (6 months~1 year)

1-2 years of continuous learning and practice required. This series is optimal as a starting point.

Q8: How accurate is the materials recommendation system?

A: Depends on data and purpose, but general indicators:

Top-5 recommendation precision: 60-80% (within ±10% of target property)
Top-10 recommendation precision: 70-90%
Acceleration of new materials discovery: Reduce experiments by 50-90%

Points for improving accuracy: 1. Improve GNN model prediction accuracy (target R² > 0.9) 2. Sufficient training data (10,000+ materials) 3. Appropriate embedding dimensions (64-128 dimensions) 4. Adjust distance metric (cosine similarity, Euclidean distance)

Prerequisites and Related Series

Prerequisites

Required: - [ ] GNN Basics: Message passing, graph convolution, PyTorch Geometric - [ ] Advanced Python: Classes, generators, decorators, type hints - [ ] Machine Learning: Training loops, overfitting, evaluation metrics

Recommended: - [ ] Linear Algebra: Matrix operations, eigenvalue decomposition, PCA - [ ] Materials Science: Crystal structures, material properties, band gap - [ ] Data Visualization: Matplotlib, Seaborn, Plotly

Prerequisite Series

GNN Introduction Series (Intermediate) - Content: GNN basic theory, PyTorch Geometric, CGCNN/SchNet implementation - Learning time: 110-130 minutes - Why recommended: Systematically learn GNN basics - Required: Necessary to understand Chapters 3 and 4

Related Series

Bayesian Optimization Introduction (Intermediate) - Relevance: Efficient materials search utilizing materials mapping results - Link: ../bayesian-optimization-introduction/index.html
Active Learning Introduction (Intermediate) - Relevance: Next sample selection in embedding space - Link: ../active-learning-introduction/index.html
(Introductory) - Relevance: Overall picture and basics of materials informatics - Link:

Tools and Resources

Main Tools

Tool Name	Purpose	License	Installation
PyTorch Geometric	GNN implementation	MIT	`pip install torch-geometric`
UMAP	Dimensionality reduction	BSD-3	`pip install umap-learn`
scikit-learn	Machine learning/dimensionality reduction	BSD-3	`pip install scikit-learn`
Plotly	Interactive visualization	MIT	`pip install plotly`
Bokeh	Interactive visualization	BSD-3	`pip install bokeh`
pymatgen	Materials structure manipulation	MIT	`pip install pymatgen`
mp-api	Materials Project API	BSD	`pip install mp-api`

Databases

Database Name	Description	Data Count	Access
Materials Project	Crystal structure and DFT calculation data	140,000 materials	https://materialsproject.org/
AFLOW	High-throughput computation data	3,500,000 materials	http://aflowlib.org/
OQMD	Quantum materials database	1,000,000 materials	http://oqmd.org/

Learning Resources

Papers and Reviews: - Xie, T. & Grossman, J. C. (2018). "Crystal Graph Convolutional Neural Networks". Physical Review Letters. - McInnes, L. et al. (2018). "UMAP: Uniform Manifold Approximation and Projection". arXiv:1802.03426. - van der Maaten, L. & Hinton, G. (2008). "Visualizing Data using t-SNE". JMLR.

Online Resources: - UMAP Documentation - Plotly Python Graphing Library - Materials Project API Docs

Next Steps

Recommended Actions After Series Completion

Immediate (within 1-2 weeks): 1. ✅ Create portfolio on GitHub 2. ✅ Build mapping system with original data 3. ✅ Publish interactive dashboard 4. ✅ Write blog article (Qiita, Medium)

Short-term (1-3 months): 1. ✅ Accelerate materials search by combining with Bayesian optimization 2. ✅ Efficiently collect data with active learning 3. ✅ Conference presentation (Japan Society of Materials Science, MRS) 4. ✅ Proceed to Bayesian Optimization Introduction Series

Long-term (6 months or more): 1. ✅ Paper writing (npj Computational Materials, Chemistry of Materials) 2. ✅ Industrial application projects 3. ✅ Develop new GNN + dimensionality reduction methods

License and Terms of Use

CC BY 4.0 (Creative Commons Attribution 4.0 International)

What You Can Do

✅ Free viewing and downloading
✅ Educational use (classes, training, study groups)
✅ Modifications and derivative works
✅ Commercial use (corporate training, paid courses)

Conditions

📌 Author credit display: "Dr. Yusuke Hashimoto, Tohoku University - AI Terakoya"
📌 License inheritance (remain CC BY 4.0)

Let's Get Started!

Ready? Start with Chapter 1 and begin your journey into the world of materials property mapping!

Chapter 1: Fundamentals of Materials Space Visualization →

Version History

Version	Date	Changes	Author
1.0	2025-10-20	Initial release	Dr. Yusuke Hashimoto

Your materials mapping learning journey starts here!

flowchart LR subgraph "Materials Mapping Workflow" A[Materials Data\nMP/AFLOW] --> B[GNN Training\nCGCNN/MEGNet] B --> C[Embedding\nExtraction] C --> D[Dimensionality\nReduction] D --> E[Visualization &\nAnalysis] E --> F[Materials\nDiscovery] end style A fill:#e3f2fd style B fill:#fff3e0 style C fill:#f3e5f5 style D fill:#e8f5e9 style E fill:#fce4ec style F fill:#ffebee

Materials Property Mapping Introduction Series v1.0

Series Overview

Why This Series Is Needed

How to Learn

Recommended Learning Order

Chapter Details

Chapter 1: Fundamentals of Materials Space Visualization

Learning Content

Learning Objectives

Chapter 2: Mapping Materials Space with Dimensionality Reduction Methods

Learning Content

Learning Objectives

Chapter 3: Materials Representation Learning with GNN

Learning Content

Learning Objectives

Chapter 4: Practical - Materials Mapping with GNN + Dimensionality Reduction

Learning Content

Learning Objectives

Overall Learning Outcomes

Knowledge Level (Understanding)

Practical Skills (Doing)

Application Ability (Applying)

Recommended Learning Patterns

Pattern 1: Complete Mastery (For Beginners)

Pattern 2: Intensive (For Experienced Learners)

FAQ (Frequently Asked Questions)

Q1: Can I understand without completing the GNN Introduction Series?

Q2: Can I learn without a Materials Project API key?

Q3: How much computing resources (GPU) are required?

Q4: Should I use UMAP or t-SNE?

Q5: How should I interpret clustering results?

Q6: Why is detecting extrapolation regions important?

Q7: Can I become a materials mapping expert just from this series?

Q8: How accurate is the materials recommendation system?

Prerequisites and Related Series

Prerequisites

Prerequisite Series

Related Series

Tools and Resources

Main Tools

Databases

Learning Resources

Next Steps

Recommended Actions After Series Completion

License and Terms of Use

What You Can Do

Conditions

Let's Get Started!

Version History

Disclaimer