Techniques for extracting valuable insights from unlabeled data
Series Overview
This series is a practical educational content consisting of 4 chapters that allows you to learn unsupervised learning step by step from the basics.
Unsupervised Learning is a machine learning technique that discovers hidden patterns and structures from data without correct labels. Through techniques such as clustering, dimensionality reduction, and anomaly detection, it enables data understanding, visualization, compression, and anomaly detection, and is widely used in data analysis, marketing, security, and many other fields.
Features:
- ✅ From basics to practice: Systematic learning from clustering fundamentals to customer segmentation
- ✅ Implementation-focused: 35+ executable Python code examples, practical projects
- ✅ Intuitive understanding: Understanding algorithm behavior through visualization
- ✅ scikit-learn utilization: Latest implementation methods using industry-standard libraries
- ✅ Practical project: Real-world problem solving through customer segmentation
Total Learning Time: 70-90 minutes (including code execution and exercises)
How to Study
Recommended Learning Order
For Beginners (completely new to unsupervised learning):
- Chapter 1 → Chapter 2 → Chapter 3 → Chapter 4 (all chapters recommended)
- Duration: 70-90 minutes
For Intermediate learners (with machine learning experience):
- Chapter 2 → Chapter 3 → Chapter 4
- Duration: 50-60 minutes
Practical skill enhancement (implementation-focused rather than theory):
- Chapter 4 (intensive learning)
- Duration: 25-30 minutes
Chapter Details
Chapter 1: Clustering Fundamentals
Difficulty: Beginner
Reading Time: 20-25 minutes
Code Examples: 10
Learning Content
- What is Clustering - Techniques for grouping data
- K-means Algorithm - The most basic clustering method
- Hierarchical Clustering - Dendrograms and hierarchical cluster structure
- DBSCAN Algorithm - Density-based clustering
- Cluster Evaluation - Silhouette coefficient, elbow method
Learning Objectives
- ✅ Understand the concept and application examples of clustering
- ✅ Implement K-means in Python
- ✅ Create dendrograms with hierarchical clustering
- ✅ Detect arbitrarily shaped clusters with DBSCAN
- ✅ Determine the appropriate number of clusters
Chapter 2: Introduction to Dimensionality Reduction
Difficulty: Beginner to Intermediate
Reading Time: 20-25 minutes
Code Examples: 9
Learning Content
- What is Dimensionality Reduction - Visualization and compression of high-dimensional data
- Principal Component Analysis (PCA) - Linear transformation that maximizes variance
- t-SNE - Nonlinear dimensionality reduction and visualization
- UMAP - Fast and flexible dimensionality reduction
- Application Examples - Visualization of image data and text data
Learning Objectives
- ✅ Explain the purpose and benefits of dimensionality reduction
- ✅ Extract principal components with PCA
- ✅ Visualize high-dimensional data in 2D with t-SNE
- ✅ Understand the features of UMAP and differences from t-SNE
- ✅ Select appropriate dimensionality reduction methods
Chapter 3: Anomaly Detection
Difficulty: Intermediate
Reading Time: 15-20 minutes
Code Examples: 8
Learning Content
- What is Anomaly Detection - Detecting deviations from normal patterns
- Statistical Methods - Z-score, Interquartile Range (IQR)
- Isolation Forest - Utilizing the isolation of anomalous data
- One-Class SVM - Learning the boundary of normal data
- Application Examples - Fraud detection, system monitoring, quality control
Learning Objectives
- ✅ Understand types of anomaly detection and application areas
- ✅ Detect outliers with statistical methods
- ✅ Implement Isolation Forest
- ✅ Define normal regions with One-Class SVM
- ✅ Select appropriate anomaly detection methods
Chapter 4: Practical Project - Customer Segmentation
Difficulty: Intermediate
Reading Time: 25-30 minutes
Code Examples: 10
Learning Content
- Project Overview - Analysis and grouping of customer data
- Data Preprocessing - Missing value handling, normalization, feature engineering
- Exploratory Data Analysis (EDA) - Understanding data distribution and correlation
- Clustering Implementation - Comparison of K-means and hierarchical clustering
- Visualization through Dimensionality Reduction - Visualizing clusters with PCA and t-SNE
- Segment Interpretation - Deriving business value
Learning Objectives
- ✅ Appropriately preprocess customer data
- ✅ Compare and evaluate multiple clustering methods
- ✅ Visualize and interpret clusters
- ✅ Analyze characteristics of each segment
- ✅ Obtain insights that can be utilized in business strategies
Overall Learning Outcomes
Upon completing this series, you will acquire the following skills and knowledge:
Knowledge Level (Understanding)
- ✅ Explain basic concepts and application areas of unsupervised learning
- ✅ Understand the mechanisms of clustering, dimensionality reduction, and anomaly detection
- ✅ Explain the characteristics and use cases of each method
- ✅ Intuitively understand the operating principles of algorithms
- ✅ Understand scikit-learn's API and design philosophy
Practical Skills (Doing)
- ✅ Implement K-means, DBSCAN, and hierarchical clustering
- ✅ Visualize high-dimensional data with PCA, t-SNE, and UMAP
- ✅ Detect anomalies with Isolation Forest and One-Class SVM
- ✅ Calculate and interpret cluster evaluation metrics
- ✅ Build a complete pipeline for customer segmentation
Application Ability (Applying)
- ✅ Select appropriate unsupervised learning methods for new problems
- ✅ Customize methods according to data characteristics
- ✅ Visualize results and derive business value
- ✅ Design hybrid methods combining supervised and unsupervised learning
Prerequisites
To effectively learn this series, it is desirable to have the following knowledge:
Required (Must Have)
- ✅ Python Basics: Variables, functions, loops, conditionals
- ✅ NumPy Basics: Array manipulation, basic mathematical functions
- ✅ Machine Learning Overview: Differences between supervised and unsupervised learning
Recommended (Nice to Have)
- 💡 Pandas Basics: DataFrame manipulation, data preprocessing
- 💡 Matplotlib/Seaborn: Basic graph creation
- 💡 Linear Algebra Basics: Vectors, matrices, dot products (useful for understanding PCA)
- 💡 Statistics Basics: Mean, variance, standard deviation
Recommended prior learning:
- 📚 - Basic concepts of machine learning
- 📚 Data Preprocessing (Coming Soon) - Data cleaning and feature engineering
Technologies and Tools Used
Main Libraries
- scikit-learn 1.3+ - Clustering, dimensionality reduction, anomaly detection
- NumPy 1.24+ - Numerical computation
- Pandas 2.0+ - Data manipulation
- Matplotlib 3.7+ - Visualization
- seaborn 0.12+ - Statistical visualization
- umap-learn - UMAP dimensionality reduction (optional)
Development Environment
- Python 3.8+ - Programming language
- Jupyter Notebook / Lab - Interactive development environment
- Google Colab - Cloud environment (free GPU available)
Let's Get Started!
Are you ready? Start with Chapter 1 and begin your journey into the world of unsupervised learning!
Chapter 1: Clustering Fundamentals →
Next Steps
After completing this series, we recommend proceeding to the following topics:
Deep Dive Learning
- 📚 Deep Learning for Unsupervised Learning: Autoencoder, VAE, GAN
- 📚 Time Series Unsupervised Learning: Dynamic Time Warping (DTW), time series clustering
- 📚 Text Analysis: Topic modeling (LDA), Word2Vec
- 📚 Graph Clustering: Community detection, network analysis
Related Series
- 🎯 Introduction to Neural Networks - Bridge to Autoencoders
- 🎯 - Generative models (VAE, GAN)
- 🎯 ML Deployment (Coming Soon) - Deployment to production
Practical Projects
- 🚀 Image Segmentation - Clustering of image data
- 🚀 Recommendation System - Collaborative filtering
- 🚀 Fraud Detection System - Implementation of anomaly detection
- 🚀 Document Classification - Text clustering
Update History
- 2025-10-20: v1.0 Initial release
Your journey into unsupervised learning starts here!