Chapter 1: Fundamentals of GNN Structure-Based Features | Introduction to GNN Features Comparison

This chapter covers the fundamentals of Fundamentals of GNN Structure, which based features. You will learn three steps of message passing and differences between CGCNN.

Structural Information Captured by Graph Representations: A World Invisible to Composition-Based Features

1.1 Limitations of Composition-Based Features

In materials science AI prediction, composition-based features have long been the mainstream approach. Methods like Magpie (Ward et al., 2016) and Matminer calculate statistical features from chemical composition (e.g., Fe₂O₃).

1.1.1 Examples of Composition-Based Features

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: 1.1.1 Examples of Composition-Based Features

Purpose: Demonstrate core concepts and implementation patterns
Target: Beginner to Intermediate
Execution time: ~5 seconds
Dependencies: None
"""

# Google Colab environment setup
!pip install matminer pymatgen scikit-learn

import numpy as np
from matminer.featurizers.composition import ElementProperty
from pymatgen.core.composition import Composition

# Example 1: Computing Magpie features
magpie = ElementProperty.from_preset("magpie")

# Features for Fe₂O₃ (iron oxide)
comp = Composition("Fe2O3")
features = magpie.featurize(comp)

print(f"Feature dimensionality: {len(features)}")
print(f"First 10 features: {features[:10]}")
# Output example: Feature dimensionality: 132
# Output example: First 10 features: [55.845, 15.999, 39.998, ...]

Composition-based features include the following 132 dimensions: average atomic weight and density, statistics of electronegativity and ionic radius (mean, variance, max, min), and statistics of electron configuration.

1.1.2 Critical Limitation: Missing Structural Information

Composition-based features do not consider spatial arrangement of atoms at all. This causes the following problems:

graph LR A[C: Diamond] -.Same composition.-> B[C: Graphite] A --> C[Hardness: 10,000 HV] B --> D[Hardness: 2-3 HV] E[SiO₂: α-quartz] -.Same composition.-> F[SiO₂: β-cristobalite] E --> G[Density: 2.65 g/cm³] F --> H[Density: 2.33 g/cm³] style A fill:#e3f2fd style B fill:#e3f2fd style E fill:#fff3e0 style F fill:#fff3e0

Concrete Examples:

Diamond vs Graphite (both are C):
- Diamond: sp³ hybridization, tetrahedral structure, hardness 10,000 HV
- Graphite: sp² hybridization, layered structure, hardness 2-3 HV
- Composition-based features are completely identical!
Polymorphs of SiO₂ (α-quartz vs β-cristobalite):
- α-quartz: density 2.65 g/cm³, hexagonal
- β-cristobalite: density 2.33 g/cm³, cubic
- Same composition but completely different properties

# Example 2: Demonstrating limitations of composition-based features
from pymatgen.core import Structure, Lattice

# Diamond structure (sp³)
diamond_lattice = Lattice.cubic(3.567)
diamond = Structure(diamond_lattice, ["C", "C"],
                    [[0, 0, 0], [0.25, 0.25, 0.25]])

# Graphite structure (sp², simplified version)
graphite_lattice = Lattice.hexagonal(2.46, 6.71)
graphite = Structure(graphite_lattice, ["C", "C"],
                     [[0, 0, 0], [1/3, 2/3, 0.5]])

# Computing composition-based features
comp_diamond = diamond.composition
comp_graphite = graphite.composition

features_diamond = magpie.featurize(comp_diamond)
features_graphite = magpie.featurize(comp_graphite)

print(f"Diamond and graphite features are identical: {np.allclose(features_diamond, features_graphite)}")
# Output: True (indistinguishable with composition-based approach)

print(f"Actual densities:")
print(f"  Diamond: {diamond.density:.2f} g/cm³")  # 3.51
print(f"  Graphite: {graphite.density:.2f} g/cm³")  # 2.26
print(f"Density difference: {abs(diamond.density - graphite.density)/diamond.density * 100:.1f}%")
# Output: Density difference: 35.6% (cannot be captured by composition-based features)

1.2 Graph Representation: A Mathematical Language to Describe Structure

1.2.1 Fundamentals of Graph Theory

A graph \( G = (V, E) \) consists of a vertex set \( V \) (set of atoms) and an edge set \( E \) (bonds between atoms representing chemical bonds or spatial proximity).

Each vertex \( v_i \in V \) is assigned node features \( \mathbf{x}_i \in \mathbb{R}^d \) such as atomic number, atomic weight, electronegativity, ionic radius, and number of valence electrons.

Each edge \( e_{ij} \in E \) is assigned edge features \( \mathbf{e}_{ij} \in \mathbb{R}^k \) such as interatomic distance \( r_{ij} \), bond type (single bond, double bond, etc.), and bond angle.

graph TD subgraph "Molecule: H₂O" O[O
Atomic number=8
Electronegativity=3.44] H1[H
Atomic number=1
Electronegativity=2.20] H2[H
Atomic number=1
Electronegativity=2.20] O ---|"Distance=0.96Å
Single bond"| H1 O ---|"Distance=0.96Å
Single bond"| H2 end style O fill:#e8f5e9 style H1 fill:#e3f2fd style H2 fill:#e3f2fd

1.2.2 Graph Data Structure in PyTorch Geometric

# Requirements:
# - Python 3.9+
# - torch>=2.0.0, <2.3.0

"""
Example: 1.2.2 Graph Data Structure in PyTorch Geometric

Purpose: Demonstrate data visualization techniques
Target: Advanced
Execution time: ~5 seconds
Dependencies: None
"""

# Example 3: Graph representation of H₂O in PyTorch Geometric
!pip install torch-geometric torch-scatter torch-sparse

import torch
from torch_geometric.data import Data

# Graph representation of H₂O molecule
# Node features: [atomic number, electronegativity]
node_features = torch.tensor([
    [8, 3.44],   # O
    [1, 2.20],   # H1
    [1, 2.20]    # H2
], dtype=torch.float)

# Edge list (bidirectional for undirected graph)
edge_index = torch.tensor([
    [0, 1, 1, 0, 0, 2, 2, 0],  # source nodes
    [1, 0, 0, 1, 2, 0, 0, 2]   # target nodes
], dtype=torch.long)

# Edge features: [interatomic distance (Å)]
edge_attr = torch.tensor([
    [0.96], [0.96],  # O-H1
    [0.96], [0.96],  # H1-O (bidirectional)
    [0.96], [0.96],  # O-H2
    [0.96], [0.96]   # H2-O (bidirectional)
], dtype=torch.float)

# PyTorch Geometric Data object
data = Data(x=node_features, edge_index=edge_index, edge_attr=edge_attr)

print(f"Number of nodes: {data.num_nodes}")        # 3
print(f"Number of edges: {data.num_edges}")        # 8 (bidirectional)
print(f"Node feature dimensions: {data.num_node_features}")  # 2
print(f"Edge feature dimensions: {data.num_edge_features}")  # 1
print(f"\nData structure:")
print(data)

1.2.3 Composition-Based vs Graph-Based: Information Comparison

Feature	Composition-Based (Magpie)	Graph-Based (GNN)
Information Source	Chemical composition only	Composition + atomic arrangement
Feature Dimensionality	Fixed (132 dimensions)	Variable (depends on number of nodes)
Polymorph Distinction	❌ Impossible	✅ Possible
Local Environment	❌ Not considered	✅ Considered via message passing
Computational Cost	Low (seconds)	Medium-High (minutes, GPU recommended)
Data Requirements	Low (100-1000 samples)	Medium-High (1000-10000 samples)

1.3 Basic Principles of GNN (Graph Neural Networks)

1.3.1 Concept of Message Passing

The core of GNNs is message passing. Each atom (node) aggregates information from neighboring atoms and updates its own representation.

Mathematical Formulation:

\[ \mathbf{h}_i^{(k+1)} = \text{UPDATE}^{(k)} \left( \mathbf{h}_i^{(k)}, \text{AGGREGATE}^{(k)} \left( \{ \mathbf{h}_j^{(k)} : j \in \mathcal{N}(i) \} \right) \right) \]

Where \( \mathbf{h}_i^{(k)} \) is the hidden representation of node \( i \) at layer \( k \), \( \mathcal{N}(i) \) is the set of neighbor nodes of node \( i \), \( \text{AGGREGATE} \) is the aggregation function for neighbor information (SUM, MEAN, MAX, etc.), and \( \text{UPDATE} \) is the update function for node representation (MLP, GRU, etc.).

graph LR subgraph "Layer k" A1[h₁⁽ᵏ⁾] B1[h₂⁽ᵏ⁾] C1[h₃⁽ᵏ⁾] D1[h₄⁽ᵏ⁾] end subgraph "Message Passing" B1 --> M[AGGREGATE] C1 --> M D1 --> M A1 --> U[UPDATE] M --> U end subgraph "Layer k+1" A2[h₁⁽ᵏ⁺¹⁾] end U --> A2 style M fill:#fff3e0 style U fill:#e8f5e9 style A2 fill:#e3f2fd

1.3.2 Implementation of a Simple GNN

# Requirements:
# - Python 3.9+
# - torch>=2.0.0, <2.3.0

# Example 4: Minimal implementation of message passing
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degree

class SimpleGCNConv(MessagePassing):
    """Simple Graph Convolutional Network layer"""
    def __init__(self, in_channels, out_channels):
        super().__init__(aggr='add')  # "add" aggregation
        self.lin = nn.Linear(in_channels, out_channels)

    def forward(self, x, edge_index):
        # Step 1: Add self-loops (consider messages from self)
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))

        # Step 2: Linear transformation
        x = self.lin(x)

        # Step 3: Normalization by degree
        row, col = edge_index
        deg = degree(col, x.size(0), dtype=x.dtype)
        deg_inv_sqrt = deg.pow(-0.5)
        deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
        norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]

        # Step 4: Message passing
        return self.propagate(edge_index, x=x, norm=norm)

    def message(self, x_j, norm):
        # Normalized message
        return norm.view(-1, 1) * x_j

# Test
conv = SimpleGCNConv(in_channels=2, out_channels=8)
h0 = data.x  # Node features of H₂O (3 nodes, 2 dimensions)

h1 = conv(h0, data.edge_index)
print(f"Input shape: {h0.shape}")  # torch.Size([3, 2])
print(f"Output shape: {h1.shape}")  # torch.Size([3, 8])
print(f"Layer 1 output (first node):\n{h1[0]}")

1.3.3 CGCNN vs MPNN: Architectural Differences

CGCNN (Crystal Graph Convolutional Neural Networks) is specialized for crystalline materials. Its target is crystal structures (with periodic boundary conditions). Edge features emphasize interatomic distances. Aggregation uses soft attention via edge gating mechanism. Applications include Materials Project and OQMD.

MPNN (Message Passing Neural Networks) is a general framework. Its target includes molecules, proteins, and crystals (all types). The message function is customizable. Aggregation can be chosen from SUM, MEAN, MAX, etc. Applications include QM9, ZINC, and ChEMBL.

Feature	CGCNN	MPNN
Paper	Xie & Grossman (2018)	Gilmer et al. (2017)
Main Target	Crystalline materials	Both molecules and crystals
Edge Processing	Gating mechanism	General message function
Aggregation Method	Weighted SUM	SUM/MEAN/MAX
Periodic Boundary Conditions	✅ Considered	❌ Not supported by default

1.4 Example: Distinguishing Diamond and Graphite

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

# Example 5: Distinguishing diamond and graphite with GNN
from torch_geometric.data import Data
import numpy as np

def structure_to_graph(structure, cutoff=5.0):
    """Create PyTorch Geometric graph from pymatgen structure"""
    # Node features: atomic number
    x = torch.tensor([[site.specie.Z] for site in structure], dtype=torch.float)

    # Create edge list (neighbors within cutoff radius)
    edges = []
    edge_attrs = []

    for i, site_i in enumerate(structure):
        for j, site_j in enumerate(structure):
            if i != j:
                dist = structure.get_distance(i, j)
                if dist <= cutoff:
                    edges.append([i, j])
                    edge_attrs.append([dist])

    edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()
    edge_attr = torch.tensor(edge_attrs, dtype=torch.float)

    return Data(x=x, edge_index=edge_index, edge_attr=edge_attr)

# Convert diamond and graphite to graphs
graph_diamond = structure_to_graph(diamond, cutoff=2.0)
graph_graphite = structure_to_graph(graphite, cutoff=2.0)

print("Diamond graph:")
print(f"  Number of nodes: {graph_diamond.num_nodes}")
print(f"  Number of edges: {graph_diamond.num_edges}")
print(f"  Average degree: {graph_diamond.num_edges / graph_diamond.num_nodes:.2f}")

print("\nGraphite graph:")
print(f"  Number of nodes: {graph_graphite.num_nodes}")
print(f"  Number of edges: {graph_graphite.num_edges}")
print(f"  Average degree: {graph_graphite.num_edges / graph_graphite.num_nodes:.2f}")

# Output example:
# Diamond: Average degree 4.00 (sp³, tetrahedral)
# Graphite: Average degree 3.00 (sp², planar trigonal)
# → Distinguishable from graph structure!

1.5 Fundamentals of PyTorch Geometric

1.5.1 Details of Data Object

# Example 6: Attributes of PyTorch Geometric Data object
from torch_geometric.data import Data

# More complex example: CO₂ molecule
# O=C=O (linear structure)

node_features = torch.tensor([
    [8, 3.44, 6],   # O1: atomic number, electronegativity, valence electrons
    [6, 2.55, 4],   # C:  atomic number, electronegativity, valence electrons
    [8, 3.44, 6]    # O2: atomic number, electronegativity, valence electrons
], dtype=torch.float)

edge_index = torch.tensor([
    [0, 1, 1, 0, 1, 2, 2, 1],
    [1, 0, 0, 1, 2, 1, 1, 2]
], dtype=torch.long)

edge_attr = torch.tensor([
    [1.16, 2],  # O1-C: distance (Å), bond order (double bond)
    [1.16, 2],  # C-O1
    [1.16, 2],  # O1-C (reverse direction)
    [1.16, 2],  # C-O1 (reverse direction)
    [1.16, 2],  # C-O2
    [1.16, 2],  # O2-C
    [1.16, 2],  # C-O2 (reverse direction)
    [1.16, 2]   # O2-C (reverse direction)
], dtype=torch.float)

# Target value: dipole moment (Debye)
y = torch.tensor([[0.0]], dtype=torch.float)  # CO₂ is symmetric, so 0

data_co2 = Data(x=node_features, edge_index=edge_index,
                edge_attr=edge_attr, y=y)

print("CO₂ molecule graph data:")
print(f"  Node feature shape: {data_co2.x.shape}")        # [3, 3]
print(f"  Edge index shape: {data_co2.edge_index.shape}")  # [2, 8]
print(f"  Edge feature shape: {data_co2.edge_attr.shape}")     # [8, 2]
print(f"  Target value: {data_co2.y.item():.2f} Debye")
print(f"  Directed graph: {data_co2.is_directed()}")  # True

1.5.2 DataLoader and Batch Processing

# Example 7: Implementing batch processing
from torch_geometric.loader import DataLoader
from torch_geometric.data import Batch

# Create multiple molecule data
molecules = [data, data_co2]  # H₂O and CO₂

# Create DataLoader
loader = DataLoader(molecules, batch_size=2, shuffle=True)

for batch in loader:
    print("Batch data:")
    print(f"  Batch size: {batch.num_graphs}")
    print(f"  Total number of nodes: {batch.num_nodes}")
    print(f"  Total number of edges: {batch.num_edges}")
    print(f"  Node feature shape: {batch.x.shape}")
    print(f"  Batch vector: {batch.batch}")
    # Batch vector example: tensor([0, 0, 0, 1, 1, 1])
    # First 3 nodes belong to molecule 0, next 3 nodes to molecule 1
    break

Advantages of batch processing: Speedup through GPU parallelization, improved memory efficiency, and application of mini-batch gradient descent.

1.5.3 Graph Pooling: Molecular-Level Representation

# Example 8: Global pooling
from torch_geometric.nn import global_mean_pool, global_max_pool, global_add_pool

# Create graph-level representation from node features
batch = Batch.from_data_list(molecules)

# Mean pooling
graph_emb_mean = global_mean_pool(batch.x, batch.batch)
print(f"Mean pooling output shape: {graph_emb_mean.shape}")  # [2, feature_dim]

# Max pooling
graph_emb_max = global_max_pool(batch.x, batch.batch)
print(f"Max pooling output shape: {graph_emb_max.shape}")

# Sum pooling
graph_emb_sum = global_add_pool(batch.x, batch.batch)
print(f"Sum pooling output shape: {graph_emb_sum.shape}")

When to use each pooling method: Mean pooling is invariant to molecule size (recommended). Max pooling emphasizes most important atoms. Sum pooling depends on molecule size (suitable for additive properties).

1.6 Summary

In this chapter, we learned the fundamentals of GNN structure-based features:

1. Limitations of composition-based features — Cannot distinguish diamond and graphite.

2. Strengths of graph representation — Mathematically describes atomic arrangement.

3. Message passing — Core computational process of GNNs.

4. CGCNN vs MPNN — Crystal-specific vs general framework.

5. PyTorch Geometric — Implementation foundation for graph deep learning.

In the next chapter, we will learn the detailed implementation of CGCNN and crystal property prediction using Materials Project.

Exercises

Easy (Basic Confirmation)

Q1: What information cannot be captured by composition-based features?

Answer: Spatial arrangement of atoms (structural information)

Explanation:

Composition-based features (Magpie, etc.) calculate statistical features from chemical composition (Fe₂O₃, etc.), but do not consider the following at all:

3D coordinates of atoms
Bond lengths, bond angles
Crystal structure (fcc, bcc, etc.)
Local environment (coordination number, coordination polyhedra)

Therefore, they cannot distinguish diamond and graphite (both are pure carbon).

Q2: In PyTorch Geometric's Data object, which attribute stores node features?

Answer: x

Explanation:

Main attributes of PyTorch Geometric Data object:

data.x: Node features (shape: [num_nodes, num_features])
data.edge_index: Edge list (shape: [2, num_edges])
data.edge_attr: Edge features (shape: [num_edges, num_edge_features])
data.y: Target value (property to predict)

Q3: What are the three main steps of message passing?

Answer: Message (message generation), Aggregate (aggregation), Update (update)

Explanation:

Message: Generate messages on each edge
- Example: \( m_{ij} = \text{MLP}(\mathbf{h}_j, \mathbf{e}_{ij}) \)
Aggregate: Aggregate messages from neighbors
- Example: \( m_i = \sum_{j \in \mathcal{N}(i)} m_{ij} \)
Update: Update node representation
- Example: \( \mathbf{h}_i^{(k+1)} = \text{GRU}(\mathbf{h}_i^{(k)}, m_i) \)

Medium (Application)

Q4: Explain the difference in graph structure between diamond and graphite using average degree.

Answer: Diamond (average degree ≈ 4.0) vs Graphite (average degree ≈ 3.0)

Explanation:

Diamond:
- sp³ hybrid orbitals
- Each carbon atom bonds with 4 neighboring atoms
- Tetrahedral structure
- Average degree ≈ 4.0
Graphite:
- sp² hybrid orbitals
- Each carbon atom bonds with 3 neighboring atoms
- Planar trigonal structure (layered)
- Average degree ≈ 3.0

GNNs can learn this difference in degree and distinguish between the two.

Q5: Give two reasons why CGCNN is suitable for crystalline materials compared to MPNN.

Answer: (1) Consideration of periodic boundary conditions, (2) Edge gating mechanism

Explanation:

Periodic Boundary Conditions:
- Crystals are periodic structures that repeat infinitely
- CGCNN considers atoms within the unit cell and periodically repeated neighboring atoms
- MPNN is non-periodic by default (designed for molecules)
Edge Gating Mechanism:
- Weighting according to interatomic distance
- Suppresses messages from distant atoms
- Appropriately models local environment of crystals

Q6: Create a graph for the CO₂ molecule (O=C=O) in PyTorch Geometric. Use only atomic number for node features.

Solution Example:

# Requirements:
# - Python 3.9+
# - torch>=2.0.0, <2.3.0

"""
Example: Solution Example:

Purpose: Demonstrate core concepts and implementation patterns
Target: Advanced
Execution time: ~5 seconds
Dependencies: None
"""

import torch
from torch_geometric.data import Data

# Node features: atomic number
x = torch.tensor([[8], [6], [8]], dtype=torch.float)  # O, C, O

# Edge list (undirected graph)
edge_index = torch.tensor([
    [0, 1, 1, 2],  # O-C, C-O, C-O, O-C
    [1, 0, 2, 1]
], dtype=torch.long)

data_co2 = Data(x=x, edge_index=edge_index)
print(data_co2)

Explanation:

Node 0: O (atomic number 8)
Node 1: C (atomic number 6)
Node 2: O (atomic number 8)
Edges: O-C, C-O (bidirectional for undirected graph)

Hard (Advanced)

Q7: Compare the data efficiency of composition-based features and GNN features. Which is more advantageous with small data (100 samples) and large data (10,000 samples)? Explain with reasons.

Answer:

Small Data (100 samples): Composition-based features are advantageous

Reason 1: Fixed dimensionality (132 dimensions) is less prone to overfitting
Reason 2: Domain knowledge (electronegativity, ionic radius, etc.) is incorporated
Reason 3: High accuracy even with linear models (Ridge, Lasso)

Large Data (10,000 samples): GNN features are advantageous

Reason 1: Utilizes structural information, achieving higher accuracy than composition-based
Reason 2: Can fully leverage the expressive power of deep learning
Reason 3: Further accuracy improvement with transfer learning (pre-trained models)

Quantitative Comparison (Materials Project Data):

Data Size	Composition-Based (MAE)	GNN (MAE)
100 samples	0.25 eV/atom	0.35 eV/atom (overfitting)
1,000 samples	0.18 eV/atom	0.15 eV/atom
10,000 samples	0.15 eV/atom	0.08 eV/atom ⭐

Q8: Compare the characteristics of aggregation functions in message passing (SUM, MEAN, MAX) and explain situations where each is appropriate.

Answer:

Aggregation Function	Formula	Characteristics	Appropriate Situations
SUM	\( \sum_{j \in \mathcal{N}(i)} m_{ij} \)	Additive, degree-dependent	Stoichiometric properties (mass, charge)
MEAN	\( \frac{1}{\|\mathcal{N}(i)\|} \sum_{j} m_{ij} \)	Normalized, degree-invariant	Average local environment properties (electronegativity)
MAX	\( \max_{j \in \mathcal{N}(i)} m_{ij} \)	Max value extraction	Emphasize most important features (active site detection)

Practical Recommendations:

MEAN: Recommended by default (degree-invariant, stable learning)
SUM: When predicting additive properties
MAX: Local anomaly detection, identifying catalytic active sites

Q9: Discuss the impact of cutoff radius selection on prediction accuracy in graph representations of crystalline materials. Explain the problems when cutoff radius is too short and too long.

Answer:

When cutoff radius is too short (e.g., 2Å):

Problem 1: Considers only first neighbors, ignores long-range interactions
Problem 2: Few edges, slow information propagation
Problem 3: Requires multi-layer GNN (increased computational cost)
Example: Cannot capture Coulomb interactions in ionic crystals (long-range)

When cutoff radius is too long (e.g., 10Å):

Problem 1: Explosive increase in edges (O(N²))
Problem 2: Memory exhaustion, increased computation time
Problem 3: Increased noise (overestimating influence of distant atoms)
Problem 4: Increased risk of overfitting

Selecting Optimal Cutoff Radius:

Material Type	Recommended Cutoff Radius	Reason
Covalent crystals (Si, Diamond)	4-5Å	Consider up to second neighbors
Ionic crystals (NaCl, MgO)	6-8Å	Long-range Coulomb interactions
Metals (Fe, Cu)	5-6Å	Consider up to third neighbors
Molecular crystals	8-10Å	Intermolecular interactions (van der Waals forces)

Experimental Optimization:

# Evaluate impact of cutoff radius
cutoffs = [3.0, 4.0, 5.0, 6.0, 8.0, 10.0]
for cutoff in cutoffs:
    graph = structure_to_graph(structure, cutoff=cutoff)
    print(f"Cutoff {cutoff}Å: {graph.num_edges} edges")
    # Train and evaluate accuracy

Learning Objectives Checklist

Upon completing this chapter, you should be able to explain the following:

Basic Understanding

✅ List three limitations of composition-based features with concrete examples
✅ Explain the mathematical definition of graph representation (\( G = (V, E) \))
✅ Understand the three steps of message passing
✅ Explain the differences between CGCNN and MPNN

Practical Skills

✅ Create Data objects in PyTorch Geometric
✅ Construct graphs from pymatgen structures
✅ Distinguish diamond and graphite by graph structure
✅ Implement batch processing with DataLoader

Application Ability

✅ Choose between composition-based and GNN for new materials problems
✅ Experimentally determine optimal cutoff radius
✅ Infer material properties from graph structure characteristics (average degree, density)

Next Steps

In the next chapter, we will learn the detailed implementation of CGCNN and formation energy prediction using Materials Project.

← Series Index Chapter 2: CGCNN Implementation →

References

Ward, L., Agrawal, A., Choudhary, A., & Wolverton, C. (2016). "A general-purpose machine learning framework for predicting properties of inorganic materials." npj Computational Materials, 2, 16028, pp. 1-7.
Xie, T., & Grossman, J. C. (2018). "Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties." Physical Review Letters, 120(14), 145301, pp. 1-6.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). "Neural Message Passing for Quantum Chemistry." Proceedings of the 34th International Conference on Machine Learning, PMLR 70, pp. 1263-1272.
Fey, M., & Lenssen, J. E. (2019). "Fast Graph Representation Learning with PyTorch Geometric." ICLR Workshop on Representation Learning on Graphs and Manifolds, pp. 1-9.
Jain, A., Ong, S. P., Hautier, G., Chen, W., Richards, W. D., Dacek, S., ... & Persson, K. A. (2013). "Commentary: The Materials Project: A materials genome approach to accelerating materials innovation." APL Materials, 1(1), 011002, pp. 1-11.
Ong, S. P., Richards, W. D., Jain, A., Hautier, G., Kocher, M., Cholia, S., ... & Persson, K. A. (2013). "Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis." Computational Materials Science, 68, pp. 314-319.
Schütt, K. T., Sauceda, H. E., Kindermans, P. J., Tkatchenko, A., & Müller, K. R. (2018). "SchNet – A deep learning architecture for molecules and materials." The Journal of Chemical Physics, 148(24), 241722, pp. 1-10.

Disclaimer

This content is provided solely for educational, research, and informational purposes and does not constitute professional advice (legal, accounting, technical warranty, etc.).
This content and accompanying code examples are provided "AS IS" without any warranty, express or implied, including but not limited to merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, operation, or safety.
The author and Tohoku University assume no responsibility for the content, availability, or safety of external links, third-party data, tools, libraries, etc.
To the maximum extent permitted by applicable law, the author and Tohoku University shall not be liable for any direct, indirect, incidental, special, consequential, or punitive damages arising from the use, execution, or interpretation of this content.
The content may be changed, updated, or discontinued without notice.
The copyright and license of this content are subject to the stated conditions (e.g., CC BY 4.0). Such licenses typically include no-warranty clauses.