AI Terakoya - Materials Informatics Knowledge Hub

πŸ“– Reading Time: 355-460 minutes πŸ“Š Level:

AI Terakoya - Materials Informatics Knowledge Hub

Learning Platform for Data-Driven Materials Development


🏫 Welcome to AI Terakoya

"Terakoya" were educational institutions for common people during Japan's Edo period. The modern "AI Terakoya" is a comprehensive learning platform for the convergence of materials science and data science.

Features of AI Terakoya:
- βœ… Four Specialized Series: Comprehensive coverage of MI, NM, PI, and MLP
- βœ… Gradual Learning: Systematic progression from beginner to advanced across 16 chapters
- βœ… Practice-Oriented: 115 executable code examples
- βœ… Industrial Applications: 20+ real-world case studies
- βœ… Career Support: Concrete career paths and learning roadmaps

Total Learning Time: 355-460 minutes (approximately 6-8 hours)


πŸ“š Four Introduction Series

πŸ“˜ Materials Informatics (MI) Introduction

Materials Informatics Introduction Series

Foundational series for learning AI/machine learning applications across materials science

Overview:
- 🎯 Target Areas: Materials discovery, property prediction, database utilization
- πŸ“Š Difficulty: Beginner to Advanced
- ⏱️ Learning Time: 90-120 minutes (4 chapters)
- πŸ’» Code Examples: 35 (all executable)
- πŸ”¬ Applications: Li-ion batteries, catalysts, high-entropy alloys, perovskite solar cells

Key Learning Content:
1. History of materials development and limitations of traditional methods
2. Utilization of major databases like Materials Project
3. Implementation of 6 machine learning models (Linear Regression, Random Forest, LightGBM, SVR, MLP, API integration)
4. Feature engineering with Matminer
5. Hyperparameter tuning (Grid/Random Search)
6. 5 industrial case studies

Tools Used:
- Python: scikit-learn, matminer, pandas, numpy
- Databases: Materials Project API
- Visualization: matplotlib, seaborn

πŸ“˜ Go to MI Introduction Series β†’


πŸ“— Nanomaterials (NM) Introduction

Nanomaterials Introduction Series

Learning Nanomaterial Science through Python Practice

Overview:
- 🎯 Target Areas: Nanoparticles, carbon nanotubes, graphene, quantum dots
- πŸ“Š Difficulty: Beginner to Intermediate
- ⏱️ Learning Time: 90-120 minutes (4 chapters)
- πŸ’» Code Examples: 30-35 (all executable)
- πŸ”¬ Applications: CNT composites, quantum dot luminescence, gold nanoparticle catalysts, nanomedicine

Key Learning Content:
1. Definition of nanoscale and size effects, quantum confinement effects
2. Synthesis methods (bottom-up/top-down) and characterization (TEM, SEM, XRD, UV-Vis)
3. Property prediction using 5 regression models
4. Nanomaterial design with Bayesian optimization
5. Molecular dynamics (MD) data analysis
6. Prediction interpretation with SHAP analysis

Tools Used:
- Python: scikit-learn, LightGBM, scikit-optimize, SHAP
- Analysis: pandas, numpy, scipy
- Visualization: matplotlib, seaborn

πŸ“— Go to NM Introduction Series β†’


πŸ“™ Process Informatics (PI) Introduction

Process Informatics Introduction Series

The Future of Chemical Process Optimization through Data

Overview:
- 🎯 Target Areas: Chemical process optimization, digital twins, quality control
- πŸ“Š Difficulty: Beginner to Advanced
- ⏱️ Learning Time: 90-120 minutes (4 chapters)
- πŸ’» Code Examples: 35 (all executable)
- πŸ”¬ Applications: Catalytic processes, polymerization reaction control, distillation column optimization, bioprocesses

Key Learning Content:
1. History of chemical process development and limitations of traditional methods (1-3 years for scale-up)
2. Types of process data (temperature, pressure, flow rate, yield, selectivity)
3. 6 machine learning models (Linear Regression, Random Forest, LightGBM, SVR, time series analysis, Bayesian optimization)
4. Multi-objective optimization (yield vs. cost)
5. Grid Search/Bayesian Optimization
6. 5 industrial case studies (yield improvement 70%β†’85%, etc.)

Tools Used:
- Python: scikit-learn, LightGBM, Prophet, ARIMA
- Optimization: scipy.optimize, scikit-optimize
- Visualization: matplotlib, seaborn

πŸ“™ Go to PI Introduction Series β†’


πŸ“• Machine Learning Potentials (MLP) Introduction

Machine Learning Potential Introduction Series

Next-Generation Simulation Combining Quantum Accuracy with Classical Speed

Overview:
- 🎯 Target Areas: Molecular simulation acceleration, reaction pathway exploration, catalyst design
- πŸ“Š Difficulty: Beginner to Advanced
- ⏱️ Learning Time: 85-100 minutes (4 chapters)
- πŸ’» Code Examples: 15 (all executable)
- πŸ”¬ Applications: Cu catalyst COβ‚‚ reduction, Li-ion battery electrolytes, protein folding, GaN semiconductors

Key Learning Content:
1. History of molecular simulation (DFT vs classical MD vs MLP)
2. Machine learning approximation of potential energy surfaces
3. MLP training with SchNetPack (MD17 dataset, MAE < 1 kcal/mol)
4. MLP-MD execution (50,000Γ— speedup over DFT)
5. Calculation of vibrational spectra, diffusion coefficients, radial distribution functions (RDF)
6. Efficient data collection with Active Learning

Tools Used:
- Python: PyTorch, SchNetPack, ASE
- Data: MD17 dataset
- Visualization: matplotlib, TensorBoard

πŸ“• Go to MLP Introduction Series β†’


Which Series is Right for You?

graph TD
    Start[Start Learning<br/>What interests you?] --> Q1{Materials or Process?}

    Q1 -->|Materials Development| Q2{Size Scale?}
    Q1 -->|Chemical Process Optimization| PI[πŸ“™ PI Introduction<br/>Process Informatics]

    Q2 -->|Nanoscale<br/>1-100 nm| NM[πŸ“— NM Introduction<br/>Nanomaterials]
    Q2 -->|General Materials<br/>Database Utilization| MI[πŸ“˜ MI Introduction<br/>Materials Informatics]
    Q2 -->|Molecular Level<br/>Simulation| MLP[πŸ“• MLP Introduction<br/>Machine Learning Potentials]

    MI --> Next[Next Steps]
    NM --> Next
    PI --> Next
    MLP --> Next

    Next --> Advanced[Deepen Your Knowledge:<br/>Complete Other Series]

    style Start fill:#e3f2fd
    style Q1 fill:#fff3e0
    style Q2 fill:#f3e5f5
    style MI fill:#e3f2fd
    style NM fill:#fff4e1
    style PI fill:#f3e5f5
    style MLP fill:#e8f5e9
    style Next fill:#ffebee
    style Advanced fill:#f3e5f5

Learning Roadmap

πŸŽ“ For Beginners (2-4 Week Plan)

Week 1-2: Foundation Building
1. Complete MI Introduction (90-120 minutes)
- Understand materials science Γ— machine learning basics
- Set up Python coding environment
- Master Materials Project API usage

Week 3: Choose Application Area
2. Select one based on your interests:
- NM Introduction: Interested in nanotech β†’ Nanoparticles, graphene
- PI Introduction: Interested in chemical engineering β†’ Process optimization
- MLP Introduction: Interested in computational chemistry β†’ Molecular simulation

Week 4: Horizontal Expansion
3. Choose 1-2 remaining series of interest
4. Focus on Chapter 4 (Real-World Applications) of each series

Deliverables:
- 4-6 Python projects (GitHub portfolio)
- Personal career roadmap (3 months/1 year/3 years)


πŸš€ For Experienced Learners (1-2 Week Plan)

Prerequisites: Python, machine learning basics, materials science or chemical engineering fundamentals

Day 1-2: Rapid Learning Mode
- Skim Chapter 2 (Foundational Knowledge) of each series
- Focus on MI-specific concepts (descriptors, databases)

Day 3-5: Intensive Practice
- Fully implement Chapter 3 (Hands-On) of series of interest
- Execute all code examples and verify behavior with parameter changes

Day 6-7: Applications and Career Design
- Thoroughly read Chapter 4 (Real-World Applications) of each series
- Concretize applications to your research/work
- Plan next steps (papers, projects, conferences)

Deliverables:
- Advanced implementation projects (with hyperparameter tuning)
- Application plan for real work


🎯 Targeted Learning (Flexible)

For those seeking specific skills or knowledge

Master database utilization:
- MI Introduction β†’ Chapter 2 (Database comparison) + Chapter 3 (Materials Project API)

Master Bayesian optimization:
- NM Introduction β†’ Chapter 3 (Bayesian optimization implementation)
- PI Introduction β†’ Chapter 3 (Reaction condition optimization)
- MLP Introduction β†’ Chapter 2 (Active Learning)

Learn industrial applications:
- Cross-sectional study of Chapter 4 across all series
- Choose from 20+ case studies closest to your industry

Career planning:
- Compare Chapter 4 (Career Paths) across all series
- Understand differences between academia vs. industry vs. startups


πŸ“Š Series Comparison Table

Series Target Area Difficulty Learning Time Code Examples Prerequisites Key Tools Industrial Applications
πŸ“˜ MI General Materials Beginner-Advanced 90-120 min 35 High school math, Python basics scikit-learn, matminer, Materials Project Li-ion batteries, catalysts, high-entropy alloys
πŸ“— NM Nanomaterials Beginner-Intermediate 90-120 min 30-35 University physics/chemistry pandas, LightGBM, scikit-optimize CNT composites, quantum dots, nanomedicine
πŸ“™ PI Chemical Processes Beginner-Advanced 90-120 min 35 Chemical engineering basics scikit-learn, Prophet, scipy Petrochemicals, pharmaceuticals, bioprocesses
πŸ“• MLP Molecular Simulation Beginner-Advanced 85-100 min 15 Quantum chemistry basics PyTorch, SchNetPack, ASE Drug discovery, catalyst design, materials design

Difficulty Γ— Application Area Matrix

graph LR
    subgraph Beginner Level
        MI1[MI: Ch1-2<br/>Basic Concepts]
        NM1[NM: Ch1-2<br/>Size Effects]
        PI1[PI: Ch1-2<br/>Process Basics]
        MLP1[MLP: Ch1-2<br/>DFT vs MLP]
    end

    subgraph Intermediate Level
        MI2[MI: Ch3<br/>Python Implementation]
        NM2[NM: Ch3<br/>Bayesian Optimization]
        PI2[PI: Ch3<br/>Multi-objective Optimization]
        MLP2[MLP: Ch3<br/>SchNetPack]
    end

    subgraph Advanced Level
        MI3[MI: Ch4<br/>Industrial Applications]
        NM3[NM: Ch4<br/>Case Studies]
        PI3[PI: Ch4<br/>Digital Twins]
        MLP3[MLP: Ch4<br/>Foundation Models]
    end

    MI1 --> MI2 --> MI3
    NM1 --> NM2 --> NM3
    PI1 --> PI2 --> PI3
    MLP1 --> MLP2 --> MLP3

    style MI1 fill:#e3f2fd
    style MI2 fill:#bbdefb
    style MI3 fill:#90caf9
    style NM1 fill:#fff4e1
    style NM2 fill:#ffe0b2
    style NM3 fill:#ffcc80
    style PI1 fill:#f3e5f5
    style PI2 fill:#e1bee7
    style PI3 fill:#ce93d8
    style MLP1 fill:#e8f5e9
    style MLP2 fill:#c8e6c9
    style MLP3 fill:#a5d6a7

🌐 Shared Learning Resources

Online Courses

Key Textbooks

  1. Rajan, K. (2013). Materials Informatics. Materials Today.
  2. Lookman, T., et al. (2018). Information Science for Materials Discovery and Design. Springer. DOI: 10.1007/978-3-319-23871-5
  3. Behler, J. (2016). Perspective: Machine learning potentials for atomistic simulations. J. Chem. Phys. DOI: 10.1063/1.4966192
  4. Cao, G. & Wang, Y. (2011). Nanostructures and Nanomaterials. World Scientific.
  5. Seborg, D. E., et al. (2016). Process Dynamics and Control (4th ed.). Wiley.

Major Databases & Tools

Materials Databases:
- Materials Project - 140k+ materials, DFT calculations
- AFLOW - Crystal structure focused, 3.5M structures
- OQMD - Quantum calculations, 815k materials
- NOMAD - Large-scale DFT database

Python Libraries:
- pymatgen - Foundational library for materials analysis
- matminer - Feature engineering
- SchNetPack - Machine learning potentials
- ASE - Atomic Simulation Environment

Visualization Tools:
- matplotlib, seaborn, plotly
- TensorBoard (training visualization)
- VESTA (crystal structure visualization)

Communities

Japan:
- Japan Society of Materials Science (JSMS)
- Materials Research Society - Japan (MRS-J)
- Society of Chemical Engineers, Japan (SCEJ)
- Japan Society of Computational Chemistry
- Molecular Science Society of Japan

International:
- Materials Research Society (MRS)
- American Institute of Chemical Engineers (AIChE)
- American Chemical Society (ACS)
- European Materials Research Society (E-MRS)
- CECAM (Computational Molecular Science)
- MolSSI (Molecular Sciences Software Institute)

Major Conferences


❓ FAQ (Frequently Asked Questions)

Q1: Which series should I start with?

A: Choose based on your background and interests:

For complete beginners, MI Introduction is strongly recommended. You'll learn database utilization methods like Materials Project, which forms the foundation for other series.


Q2: Can I study multiple series in parallel?

A: Possible, but not recommended. Reasons:

Recommended approach:
1. Fully master one series first (1-2 weeks)
2. Publish as portfolio on GitHub
3. Move to next series
4. Aim to complete all series in 2-4 weeks total


Q3: Can Python beginners learn from these series?

A: Yes, if you understand basic syntax:

Required skills:
- Variables, data types (int, float, str, list, dict)
- Function definition and calling
- Loops (for, while) and conditionals (if/else)
- Library installation and import

Recommended pre-learning (if no Python experience):
1. Python Official Tutorial (Chapters 1-4, 5-10 hours)
2. Codecademy Python Course (free trial)
3. Write 5-10 simple Python programs

Chapter 3 of each series includes detailed code comments designed for beginner comprehension.


Q4: How is this applied in industry?

A: Detailed case studies in Chapter 4 of each series. Major applications:

MI Applications:
- Tesla/Panasonic: Li-ion battery material optimization (+20% capacity, 67% shorter development)
- Toyota: Pt-free catalyst development (80% cost reduction, 120% activity)
- Boeing/Airbus: High-entropy alloys (20% weight reduction)

NM Applications:
- Mitsubishi Chemical: CNT composite materials (35% strength improvement, 60% shorter development)
- Samsung: Quantum dot displays (25% wider color gamut)
- Pfizer: Nanomedicine drug delivery (50% fewer side effects)

PI Applications:
- Mitsubishi Chemical: Catalytic process optimization (yield 70%β†’85%, +2 billion yen annual revenue)
- Asahi Kasei: Polymerization reaction control (defect rate 5%β†’1%, -500 million yen/year waste)
- Takeda Pharmaceutical: Drug batch process (FDA inspection pass first time, 3 months earlier market entry)

MLP Applications:
- MIT/SLAC: Cu catalyst COβ‚‚ reduction (reaction pathway elucidation, 50,000Γ— speedup)
- SchrΓΆdinger/Pfizer: Protein folding (50% shorter drug development)
- NIMS: GaN semiconductor crystal growth (90% defect reduction, 30% cost reduction)

ROI (Return on Investment) examples:
- Development time reduction: 50-90% decrease
- Cost reduction: 30-80%
- Performance improvement: 20-120%
- Initial investment payback: 1-3 years


Q5: What are career paths after learning?

A: Three major paths:

Path 1: Academia (Researcher)

Path 2: Industrial R&D

Path 3: Startup/Consulting

Chapter 4 of each series details specific career paths, salary data, required skills, and learning timelines.


Q6: What code execution environment is needed?

A: Three options:

Option 2: venv (Python standard)

Option 3: Google Colab (Most convenient)

Recommendation: Start with Google Colab, migrate to Anaconda for serious learning.

GPU necessity:
- MI/NM/PI: CPU sufficient (training time minutes to tens of minutes)
- MLP: GPU strongly recommended (10-100Γ— training time reduction)


Q7: How independent are the series?

A: Each series can be studied independently, but some common concepts exist:

Common concepts (appear in all series):
- Machine learning basics (regression, classification, optimization)
- Basic Python libraries (numpy, pandas, matplotlib)
- Data preprocessing, feature engineering
- Model evaluation (MAE, RΒ², cross-validation)

Series-specific concepts:
- MI: Material descriptors, Materials Project API, crystal structures
- NM: Size effects, quantum confinement, nanoparticle synthesis
- PI: Process parameters, time series analysis, multi-objective optimization
- MLP: Potential energy surfaces, DFT, symmetry functions, graph neural networks

Interrelationships:

MI (Foundation) β†’ NM (Application 1)
                β†’ PI (Application 2)
                β†’ MLP (Application 3)

Learning MI first makes understanding other series 30-40% faster.


Q8: Is commercial use permitted?

A: Depends on libraries and data:

βœ… Commercial use allowed (MIT License):

⚠️ Requires verification (possibly academic use only):

πŸ“Œ When considering corporate use:

  1. Check dataset licenses
  2. Train models with company data (safest)
  3. Verify open source library commercial use terms
  4. Consult legal department

Each series FAQ provides detailed license information.


Q9: What are update plans for the series?

A: Continuous improvement and expansion planned:

Short-term (1-3 months):
- Bug fixes, typo corrections
- Additional code examples (community requests)
- New case studies

Medium-term (3-6 months):
- Consider new series:
- Chemoinformatics (CI) Introduction
- Bioinformatics (BI) Introduction
- Data-Driven Materials Design (DDMD) Introduction
- Interactive Jupyter Notebook versions
- Video tutorials

Long-term (6-12 months):
- Learning platform development (progress tracking features)
- Community forum
- Certification program

Feedback welcome! Please submit requests for new topics or improvement suggestions via GitHub repository Issues or email (yusuke.hashimoto.b8@tohoku.ac.jp).


πŸš€ Next Steps

Immediate (Within 1-2 weeks)

  1. βœ… Create GitHub/GitLab portfolio
    - Publish code implemented in each series with README
    - Include datasets, result visualizations, analysis
    - Examples: "MI-battery-optimization", "MLP-catalyst-simulation"

  2. βœ… Update LinkedIn profile
    - Add skills: "Materials Informatics", "Machine Learning", "Python", "PyTorch"
    - Add projects: with GitHub links

  3. βœ… Share learning record on blog/Qiita
    - Output what you learned
    - Get feedback from community

Short-term (1-3 months)

  1. βœ… Participate in Kaggle competitions
    - Materials science competitions: "Predicting Molecular Properties", "Materials Discovery"
    - Improve practical data science skills

  2. βœ… Present at domestic conferences
    - JSMS, SCEJ, Computational Chemistry Society
    - Start with poster presentations (lower barrier)

  3. βœ… Execute independent project
    - Apply MI/NM/PI/MLP to your research theme
    - Combine experimental data + machine learning

  4. βœ… Contribute to open source
    - Bug reports/feature additions for pymatgen, matminer, SchNetPack
    - Documentation translation (Japanese localization)

Medium-term (3-6 months)

  1. βœ… Read 10 papers thoroughly
    - Nature Materials, Advanced Materials, npj Computational Materials
    - J. Chem. Phys., JCTC, Computers & Chemical Engineering

  2. βœ… Internship/collaborative research
    - Companies: Mitsubishi Chemical, Toyota, Panasonic, etc.
    - Research institutions: NIMS, AIST

  3. βœ… Oral presentation at domestic conference
    - More advanced than poster, deeper discussion through Q&A

Long-term (1+ years)

  1. βœ… Present at international conferences
    - MRS Fall/Spring Meeting, E-MRS, ACS, PSE
    - English presentations, networking

  2. βœ… Submit peer-reviewed paper
    - npj Computational Materials (open access)
    - J. Chem. Phys., Ind. Eng. Chem. Res.

  3. βœ… Career transition
    - Academia: PhD program, postdoc, assistant professor
    - Industry: Data scientist, MI engineer
    - Startup: Founding, joining

  4. βœ… Next generation development
    - Organize study groups/workshops
    - Mentor junior colleagues
    - Contribute to community


πŸ“ž Feedback and Support

About AI Terakoya

This platform was created as part of the MI Knowledge Hub project under Dr. Yusuke Hashimoto, Institute of Multidisciplinary Research for Advanced Materials, Tohoku University.

Philosophy:
- Provide accessible convergence of data science and materials science
- Educational content balancing theory and practice
- Open learning community formation

Created: October 17, 2025
Version: 1.0
Total Content: 16 chapters, 115 code examples, 20 case studies

We Welcome Your Feedback

To improve this platform, we await your feedback:

Contact:
πŸ“§ Email: yusuke.hashimoto.b8@tohoku.ac.jp
πŸ™ GitHub: @YusukeHashimotoPhD
πŸ”— LinkedIn: Dr. Yusuke Hashimoto

Join the Community

Japanese Community:
- JSMS MI Forum
- Computational Chemistry Society ML Division
- MI Study Group Slack (Participation link: apply via email)

International Community:
- Materials Project Forum
- MolSSI Discussion
- CECAM Community


πŸ“œ License and Terms of Use

All content on this platform is published under CC BY 4.0 (Creative Commons Attribution 4.0 International) license.

What You Can Do

βœ… Free viewing and downloading
βœ… Educational use (university classes, corporate training, study groups, etc.)
βœ… Modification and derivative works (translation, summarization, slide creation, etc.)
βœ… Research and development use (papers, projects, product development)

Conditions

πŸ“Œ Author credit required
πŸ“Œ Note modifications if made
πŸ“Œ Contact before commercial use (no contact needed for free provision)

Citation Methods

In papers:

Hashimoto, Y. (2025). AI Terakoya - Materials Informatics Knowledge Hub.
Tohoku University. https://yusukehashimotolab.github.io/wp/knowledge/

BibTeX:

@misc{hashimoto2025aiterakoya,
  author = {Hashimoto, Yusuke},
  title = {AI Terakoya - Materials Informatics Knowledge Hub},
  year = {2025},
  publisher = {Tohoku University},
  url = {https://yusukehashimotolab.github.io/wp/knowledge/}
}

On websites/blogs:

Source: AI Terakoya - Materials Informatics Knowledge Hub (Dr. Yusuke Hashimoto, Tohoku University)
https://yusukehashimotolab.github.io/wp/knowledge/

Details: Full CC BY 4.0 License


πŸŽ“ Let's Start Learning!

Are you ready? Choose the series that suits you best and begin your journey into the world of data-driven materials development!

πŸ”° Complete Beginners β†’ Start with πŸ“˜ MI Introduction Series
βš—οΈ Nanotech Interest β†’ Start with πŸ“— NM Introduction Series
🏭 Chemical Engineering Background β†’ Start with πŸ“™ PI Introduction Series
πŸ§ͺ Computational Chemistry Experience β†’ Start with πŸ“• MLP Introduction Series


Update History


Your MI learning journey begins here!
Welcome to the future of data-driven materials development.