Introduction to Materials Databases Series v1.0

Series Overview

This series is a comprehensive 4-chapter educational content designed for those learning about materials databases—the most critical resources in materials science—from scratch, to those seeking to develop practical skills progressively.

Materials databases are vast repositories of systematically accumulated knowledge, containing DFT calculation results and experimental data. Major databases used by researchers worldwide, such as Materials Project (140k materials), AFLOW (3.5M structures), and OQMD (1M materials), consolidate decades of accumulated material property data.

Chapter Contents

Chapter 1: Complete Overview of Materials Databases

Difficulty: Beginner | Reading Time: 20-25 minutes | Code Examples: 10

Learn the characteristics of the four major materials databases (MP, AFLOW, OQMD, JARVIS) and gain the ability to select the appropriate database according to research objectives. Acquire practical skills from obtaining Materials Project API keys to basic data retrieval.

Comparison of the four major databases
API authentication and access methods
Basics of data retrieval
History of materials databases

Chapter 2: Complete Guide to Materials Project

Difficulty: Beginner to Intermediate | Reading Time: 30-35 minutes | Code Examples: 18

Aim for complete mastery of pymatgen and MPRester API. Progressively acquire practical skills including advanced query techniques, batch downloads, and data visualization.

pymatgen fundamentals
MPRester API details
Advanced query techniques
Batch downloads
Data visualization

Chapter 3: Database Integration and Workflow

Difficulty: Intermediate | Reading Time: 20-25 minutes | Code Examples: 12

Integrate multiple databases and construct data cleaning, missing value handling, and automated update pipelines. Learn the importance of data quality management through practical case studies.

Integration of multiple databases
Data cleaning
Missing value handling
Automated update pipelines

Chapter 4: Building Custom Databases

Difficulty: Intermediate | Reading Time: 15-20 minutes | Code Examples: 10

Learn how to structure and publish experimental data, from SQLite to PostgreSQL. Practice everything from schema design, CRUD operations, and backup strategies to data publication on Zenodo and DOI acquisition.

Database design fundamentals
Local DB with SQLite
PostgreSQL/MySQL
Backup strategies
Data publication and DOI acquisition

How to Proceed with Learning

For Beginners: Chapter 1 → Chapter 2 → Chapter 3 → Chapter 4 (all chapters recommended)

For Intermediate Learners: Chapter 2 (advanced queries) → Chapter 3 → Chapter 4

For Specific Skill Enhancement: Select only the chapters you need

Prerequisites

Python fundamentals (variables, functions, lists, dictionaries)
Basic pandas operations (recommended)