Chapter 1: Model Interpretability Basics

This chapter introduces the basics of Model Interpretability Basics. You will learn why model interpretability is important, taxonomy of interpretability, and characteristics of interpretable models.

Learning Objectives

By reading this chapter, you will be able to:

✅ Understand why model interpretability is important
✅ Grasp the taxonomy of interpretability
✅ Learn the characteristics of interpretable models
✅ Understand an overview of major interpretation techniques
✅ Learn criteria for evaluating interpretability
✅ Implement practical interpretable models

1.1 Why Model Interpretability Matters

Trust and Accountability

To trust machine learning model predictions, we need to understand "why the model made that prediction." Especially for high-risk decision-making (medical diagnosis, loan approval, criminal justice, etc.), accountability is essential.

Application Domain	Why Interpretability is Needed	Risks
Medical Diagnosis	Doctors need to understand diagnostic reasoning and explain to patients	Life-threatening misdiagnosis
Loan Approval	Obligation to explain rejection reasons, ensure fairness	Discriminatory decisions, legal litigation
Criminal Justice	Need to show basis for recidivism risk assessment	Unjust verdicts, human rights violations
Autonomous Vehicles	Accountability in accidents, safety verification	Loss of life, legal liability

Important: "High prediction accuracy" alone is insufficient. For stakeholders to trust and properly use models, they need to understand the basis for predictions.

Regulatory Requirements (GDPR, AI Regulations)

Regulations regarding machine learning model transparency are being strengthened worldwide:

GDPR (General Data Protection Regulation): Stipulates "right to explanation" regarding automated decision-making (Article 22)
EU AI Act: Transparency and explainability requirements for high-risk AI systems
U.S. Fair Credit Reporting Act: Obligation to provide "adverse action notice" regarding credit scores
Japan's Personal Information Protection Act: Information provision to individuals regarding automated decision-making

Debugging and Model Improvement

Interpretability is also essential for improving model performance:

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Diagnosing unexpected model predictions

Problem: Customer churn prediction model performs poorly in production
"""

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Generate sample data
np.random.seed(42)
n_samples = 1000

data = pd.DataFrame({
    'age': np.random.randint(18, 80, n_samples),
    'tenure_months': np.random.randint(1, 120, n_samples),
    'monthly_charges': np.random.uniform(20, 150, n_samples),
    'total_charges': np.random.uniform(100, 10000, n_samples),
    'num_support_calls': np.random.poisson(2, n_samples),
    'contract_type': np.random.choice(['month', 'year', '2year'], n_samples),
    'customer_id': np.arange(n_samples)  # Data leak!
})

# Target variable (churn)
data['churn'] = ((data['num_support_calls'] > 3) |
                 (data['monthly_charges'] > 100)).astype(int)

# Train model
X = data.drop('churn', axis=1)
X_encoded = pd.get_dummies(X, columns=['contract_type'])
y = data['churn']

X_train, X_test, y_train, y_test = train_test_split(
    X_encoded, y, test_size=0.2, random_state=42
)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Diagnose with Feature Importance
feature_importance = pd.DataFrame({
    'feature': X_encoded.columns,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

print("Feature Importance:")
print(feature_importance.head(10))

# Problem discovered: customer_id has highest importance (data leak)
print("\n⚠️ Abnormally high importance for customer_id → Possible data leakage")

Bias Detection

Interpretability allows us to discover unfair patterns learned by the model:

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Bias detection in hiring screening model
"""

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Sample data with bias
np.random.seed(42)
n_samples = 1000

data = pd.DataFrame({
    'years_experience': np.random.randint(0, 20, n_samples),
    'education_level': np.random.randint(1, 5, n_samples),
    'skills_score': np.random.uniform(0, 100, n_samples),
    'gender': np.random.choice(['M', 'F'], n_samples),
    'age': np.random.randint(22, 65, n_samples)
})

# Biased target (includes gender discrimination)
data['hired'] = (
    (data['years_experience'] > 5) &
    (data['skills_score'] > 60) &
    (data['gender'] == 'M')  # Gender bias
).astype(int)

# Train model
X = pd.get_dummies(data.drop('hired', axis=1), columns=['gender'])
y = data['hired']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

model = LogisticRegression(random_state=42)
model.fit(X_scaled, y)

# Check coefficients to detect bias
coefficients = pd.DataFrame({
    'feature': X.columns,
    'coefficient': model.coef_[0]
}).sort_values('coefficient', ascending=False)

print("Model Coefficients:")
print(coefficients)

# Abnormally high coefficient for gender_M → Detect gender bias
print("\n⚠️ High coefficient for gender_M → Possible gender discrimination")
print("📊 Fairness evaluation required")

1.2 Classification of Interpretability

Global Interpretation vs Local Interpretation

Classification	Description	Question	Example Methods
Global Interpretation	Understanding overall model behavior	"How does the model predict in general?"	Feature Importance, Partial Dependence
Local Interpretation	Explaining individual predictions	"Why was this customer predicted to churn?"	LIME, SHAP, Counterfactual

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Global Interpretation vs Local Interpretation
"""

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
n_samples = 500

data = pd.DataFrame({
    'age': np.random.randint(18, 70, n_samples),
    'income': np.random.uniform(20000, 150000, n_samples),
    'debt_ratio': np.random.uniform(0, 1, n_samples),
    'credit_history_months': np.random.randint(0, 360, n_samples)
})

# Target: Loan approval
data['approved'] = (
    (data['income'] > 50000) &
    (data['debt_ratio'] < 0.5) &
    (data['credit_history_months'] > 24)
).astype(int)

X = data.drop('approved', axis=1)
y = data['approved']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(X_train, y_train)

# --- Global Interpretation: Feature Importance ---
print("=== Global Interpretation ===")
print("Most important features for the overall model:")
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
print(feature_importance)

# --- Local Interpretation: Individual prediction explanation ---
print("\n=== Local Interpretation ===")
# Select one sample from test data
sample_idx = 0
sample = X_test.iloc[sample_idx:sample_idx+1]
prediction = model.predict(sample)[0]
prediction_proba = model.predict_proba(sample)[0]

print(f"Features of sample {sample_idx}:")
print(sample.T)
print(f"\nPrediction: {'Approved' if prediction == 1 else 'Rejected'}")
print(f"Probability: {prediction_proba[1]:.2%}")

# Simple local importance (tree-based)
# In practice, using SHAP or LIME is recommended
print("\nFeatures contributing to this prediction (approximate):")
for feature in X.columns:
    print(f"  {feature}: {sample[feature].values[0]:.2f}")

Model-Specific vs Model-Agnostic

Classification	Description	Advantages	Disadvantages
Model-Specific	Interpretation specific to particular models	Accurate, efficient	Cannot be applied to other models
Model-Agnostic	Applicable to any model	High versatility	May have high computational cost

Intrinsic Interpretability vs Post-hoc Interpretability

Intrinsic Interpretability: The model itself is interpretable (linear regression, decision trees)
Post-hoc Interpretability: Interpreting black-box models after the fact (SHAP, LIME)

Taxonomy of Interpretability

graph TB A[Model Interpretability] --> B[Scope] A --> C[Dependency] A --> D[Timing] B --> B1[Global Interpretation
Overall model behavior] B --> B2[Local Interpretation
Individual prediction explanation] C --> C1[Model-Specific
For specific models] C --> C2[Model-Agnostic
General purpose] D --> D1[Intrinsic Interpretability
Inherently interpretable] D --> D2[Post-hoc Interpretability
After-the-fact explanation] style A fill:#7b2cbf,color:#fff style B1 fill:#e3f2fd style B2 fill:#e3f2fd style C1 fill:#fff3e0 style C2 fill:#fff3e0 style D1 fill:#c8e6c9 style D2 fill:#c8e6c9

1.3 Interpretable Models

Linear Regression

Linear regression is one of the most easily interpretable models. The coefficient of each feature directly indicates its influence.

Formula:

$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n$$

$\beta_i$ indicates the change in predicted value for a one-unit change in feature $x_i$.

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Housing price prediction with linear regression
"""

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate sample data
np.random.seed(42)
n_samples = 200

data = pd.DataFrame({
    'square_feet': np.random.randint(500, 4000, n_samples),
    'bedrooms': np.random.randint(1, 6, n_samples),
    'age_years': np.random.randint(0, 50, n_samples),
    'distance_to_city': np.random.uniform(0, 50, n_samples)
})

# Target: Price (in ten thousands)
data['price'] = (
    data['square_feet'] * 0.5 +
    data['bedrooms'] * 50 -
    data['age_years'] * 5 -
    data['distance_to_city'] * 10 +
    np.random.normal(0, 100, n_samples)
)

X = data.drop('price', axis=1)
y = data['price']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardization (for comparing coefficients)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Interpret coefficients
coefficients = pd.DataFrame({
    'feature': X.columns,
    'coefficient': model.coef_,
    'abs_coefficient': np.abs(model.coef_)
}).sort_values('abs_coefficient', ascending=False)

print("Linear regression model coefficients:")
print(coefficients)
print(f"\nIntercept: {model.intercept_:.2f}")

print("\nInterpretation:")
print("- square_feet has the largest coefficient → Area most influences price")
print("- age_years has a negative coefficient → Older properties have lower prices")
print("- Coefficients are standardized, enabling direct comparison")

# Prediction example
sample = X_test_scaled[0:1]
prediction = model.predict(sample)[0]
print(f"\nSample predicted price: {prediction:.2f} (in ten thousands)")

Decision Trees

Decision trees have rule-based branching structures that are easy for humans to understand.

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Iris classification with decision tree
"""

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Load data
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Decision tree model (limit depth for interpretability)
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)

# Accuracy
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2%}")

# Extract rules (text format)
from sklearn.tree import export_text
tree_rules = export_text(model, feature_names=list(iris.feature_names))
print("\nDecision tree rules:")
print(tree_rules[:500] + "...")  # Display first 500 characters only

# Interpretation example
print("\nInterpretation:")
print("- petal width (cm) <= 0.8 → classified as setosa")
print("- Otherwise, petal width or petal length determines versicolor/virginica")
print("- Decision boundaries are clear and understandable even for non-experts")

Rule-Based Models

Models composed of IF-THEN rules can be directly used as business rules.

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Simple rule-based classifier
"""

import numpy as np
import pandas as pd

class SimpleRuleClassifier:
    """Interpretable rule-based classifier"""

    def __init__(self):
        self.rules = []

    def add_rule(self, condition, prediction, description=""):
        """Add a rule"""
        self.rules.append({
            'condition': condition,
            'prediction': prediction,
            'description': description
        })

    def predict(self, X):
        """Make predictions"""
        predictions = []
        for _, row in X.iterrows():
            prediction = None
            for rule in self.rules:
                if rule['condition'](row):
                    prediction = rule['prediction']
                    break
            predictions.append(prediction if prediction is not None else 0)
        return np.array(predictions)

    def explain(self):
        """Explain rules"""
        print("Classification rules:")
        for i, rule in enumerate(self.rules, 1):
            print(f"  Rule {i}: {rule['description']} → {rule['prediction']}")

# Usage example: Loan approval rules
classifier = SimpleRuleClassifier()

# Rule 1: High income and low debt
classifier.add_rule(
    condition=lambda row: row['income'] > 100000 and row['debt_ratio'] < 0.3,
    prediction=1,
    description="High income (>100K) and low debt ratio (<30%)"
)

# Rule 2: Medium income with good credit history
classifier.add_rule(
    condition=lambda row: row['income'] > 50000 and row['credit_history_months'] > 36,
    prediction=1,
    description="Medium income (>50K) and credit history >3 years"
)

# Rule 3: Reject all other cases
classifier.add_rule(
    condition=lambda row: True,
    prediction=0,
    description="All other cases"
)

# Test data
test_data = pd.DataFrame({
    'income': [120000, 60000, 30000],
    'debt_ratio': [0.2, 0.4, 0.6],
    'credit_history_months': [48, 40, 12]
})

predictions = classifier.predict(test_data)
classifier.explain()

print("\nPrediction results:")
for i, (pred, income) in enumerate(zip(predictions, test_data['income'])):
    print(f"  Applicant {i+1} (Income: ${income:,.0f}): {'Approved' if pred == 1 else 'Rejected'}")

GAM (Generalized Additive Models)

GAMs are interpretable models that can visualize the nonlinear effect of each feature.

Formula:

$$g(\mathbb{E}[y]) = \beta_0 + f_1(x_1) + f_2(x_2) + \cdots + f_n(x_n)$$

$f_i$ is a nonlinear function of feature $x_i$.

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Modeling nonlinear relationships with GAM
"""

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

# Generate sample data (nonlinear relationships)
np.random.seed(42)
n_samples = 300

x1 = np.random.uniform(-3, 3, n_samples)
x2 = np.random.uniform(-3, 3, n_samples)

# Nonlinear relationships: sine function and quadratic function
y = np.sin(x1) + x2**2 + np.random.normal(0, 0.2, n_samples)

data = pd.DataFrame({'x1': x1, 'x2': x2, 'y': y})

# Feature engineering: Add polynomial features (GAM approximation)
from sklearn.preprocessing import PolynomialFeatures

X = data[['x1', 'x2']]
poly = PolynomialFeatures(degree=3, include_bias=False, interaction_features=False)
X_poly = poly.fit_transform(X)

feature_names = poly.get_feature_names_out(['x1', 'x2'])

X_train, X_test, y_train, y_test = train_test_split(
    X_poly, data['y'], test_size=0.2, random_state=42
)

# Train with Ridge regression
model = Ridge(alpha=1.0)
model.fit(X_train, y_train)

print(f"Test R² score: {model.score(X_test, y_test):.3f}")

# Visualize the effect of each feature
print("\nPolynomial coefficients for each feature:")
coef_df = pd.DataFrame({
    'feature': feature_names,
    'coefficient': model.coef_
})
print(coef_df)

print("\nInterpretation:")
print("- Odd-degree terms of x1 are important → sine function-like nonlinearity")
print("- Quadratic term of x2 is important → quadratic function relationship")
print("- Effects of each variable can be interpreted individually")

1.4 Overview of Interpretation Techniques

Feature Importance

A method to quantify the importance of features. Frequently used in tree-based models.

Mean Decrease Impurity: Measures importance by decrease in impurity
Permutation Importance: Measures by performance degradation when features are shuffled

Partial Dependence Plot (PDP)

Visualizes the relationship between a specific feature and model predictions.

Formula:

$$\text{PDP}(x_s) = \mathbb{E}_{x_c}[f(x_s, x_c)]$$

$x_s$ is the target feature, $x_c$ are the other features.

SHAP (SHapley Additive exPlanations)

Uses Shapley values from game theory to calculate the contribution of each feature.

Characteristics:

Consistent explanations
Enables both local and global interpretation
Model-agnostic

LIME (Local Interpretable Model-agnostic Explanations)

Explains individual predictions by approximating them locally with a linear model.

Procedure:

Generate samples in the neighborhood of the instance to be predicted
Obtain predictions from the black-box model
Locally approximate with an interpretable model (such as linear regression)
Interpret the coefficients of the approximate model

Saliency Maps

In image classification, visualizes which pixels are important for predictions.

Calculation Method:

$$S(x) = \left| \frac{\partial f(x)}{\partial x} \right|$$

Computes gradients with respect to the input image and highlights important regions.

1.5 Evaluating Interpretability

Fidelity

Measures how accurately the interpretation method explains the behavior of the original model.

Evaluation Metric	Description	Calculation Method
R² Score	Agreement between explanation model and original model	$R^2 = 1 - \frac{\sum(y_{\text{true}} - y_{\text{approx}})^2}{\sum(y_{\text{true}} - \bar{y})^2}$
Local Fidelity	Agreement of local predictions	Prediction error on neighborhood samples

Consistency

Evaluates whether similar explanations are obtained for similar instances.

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Evaluating consistency of interpretation
"""

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Sample data
np.random.seed(42)
n_samples = 500

data = pd.DataFrame({
    'feature1': np.random.normal(0, 1, n_samples),
    'feature2': np.random.normal(0, 1, n_samples),
    'feature3': np.random.normal(0, 1, n_samples)
})
data['target'] = (data['feature1'] + data['feature2'] > 0).astype(int)

X = data.drop('target', axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(X_train, y_train)

# Compare Feature Importance for similar samples
sample1 = X_test.iloc[0:1]
sample2 = X_test.iloc[1:2]  # Similar sample

# Simple local importance using tree paths
# (Using SHAP is recommended in practice)

print("Sample 1 features:")
print(sample1.values)
print(f"Prediction: {model.predict(sample1)[0]}")

print("\nSample 2 features:")
print(sample2.values)
print(f"Prediction: {model.predict(sample2)[0]}")

# Calculate distance
distance = np.linalg.norm(sample1.values - sample2.values)
print(f"\nDistance between samples: {distance:.3f}")
print("Consistency evaluation: Need to verify if explanations for similar samples are similar")

Stability

Evaluates whether interpretations change significantly with minor changes in input data.

Comprehensibility

Evaluates how easily humans can understand the explanation. Since quantification is difficult, user studies are common.

Evaluation Method	Description
Number of Rules	Number of rules in decision trees or rule sets (fewer is more understandable)
Number of Features	Number of features used in explanation (fewer is better)
User Study	Comprehension test by actual users

Practice Problems

Problem 1: Need for Model Interpretability

Problem: Explain why model interpretability is particularly important in the following scenarios.

Bank loan approval system
Medical image diagnosis support system
Recommendation system

Sample Answer:

Loan Approval: Obligation to explain rejection reasons (legal requirement), ensuring fairness, preventing discriminatory decisions
Medical Diagnosis: Doctors' understanding of diagnostic reasoning, explanation to patients, reducing misdiagnosis risk, responding to medical malpractice litigation
Recommendation: Improving user trust, transparency of recommendation reasons, bias detection (avoiding filter bubbles)

Problem 2: Global Interpretation and Local Interpretation

Problem: For a "customer churn prediction model," provide examples of information you would want to know from global interpretation and local interpretation respectively.

Sample Answer:

Global Interpretation:
- "Number of support inquiries" is the most influential feature for churn
- Relationship between "contract duration" and churn probability (longer duration tends to have lower churn rate)
- Top 5 most important features overall in the model
Local Interpretation:
- Reason why customer A (ID=12345) was predicted to churn (10+ support inquiries, contract duration less than 3 months, etc.)
- Factors that should be improved to reduce this customer's churn probability

Problem 3: Choosing Interpretable Models

Problem: For the following scenarios, choose which interpretable model is appropriate and explain the reason.

Housing price prediction (features: area, number of rooms, building age, etc.)
Spam email classification (features: word frequency)
Patient readmission risk prediction (features: age, diagnosis history, test values, etc.)

Sample Answer:

Linear Regression: Coefficients of each feature directly indicate influence on price, making it understandable for real estate agents and customers
Decision Tree or Rule-Based: Rules like "if the word 'free' appears 5+ times → spam" are intuitive
GAM or Decision Tree: Can visualize nonlinear relationships (e.g., U-shaped relationship between age and readmission risk). Easy for doctors to understand diagnostic logic

Problem 4: Detecting Data Leakage

Problem: Explain how to detect data leakage using Feature Importance and provide a code example.

Sample Answer:

# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0

"""
Method for detecting data leakage
"""
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# Checklist of suspicious features
suspicious_features = [
    'id', 'timestamp', 'created_at', 'updated_at',
    'target', 'label', 'outcome'  # Target variable itself or its leakage
]

# Calculate Feature Importance
model = RandomForestClassifier()
# model.fit(X_train, y_train)

feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

# Check top features
top_features = feature_importance.head(5)
for _, row in top_features.iterrows():
    feature = row['feature']
    importance = row['importance']

    # Check if suspicious features are in top ranks
    if any(suspect in feature.lower() for suspect in suspicious_features):
        print(f"⚠️ Possible data leakage: {feature} (importance: {importance:.3f})")

    # Check if importance is abnormally high (>0.9)
    if importance > 0.9:
        print(f"⚠️ Abnormally high importance: {feature} (importance: {importance:.3f})")

Problem 5: Evaluating Interpretability

Problem: What metrics or methods can be used to evaluate "Fidelity" of interpretation techniques?

Sample Answer:

R² Score: Agreement between explanation model (such as LIME's linear approximation) and original black-box model predictions
Prediction Error: Mean absolute error (MAE) between explanation model and original model predictions
Classification Accuracy Comparison: How accurately the explanation model can reproduce the original model's predictions
Local Fidelity: How accurate the explanation model is in the neighborhood of a specific instance

Problem 6: Implementation Challenge

Problem: Using scikit-learn's Titanic dataset (or any dataset), implement the following.

Train a logistic regression model and interpret its coefficients
Train a decision tree model and extract its rules
Train a random forest model and visualize Feature Importance
Compare the interpretability of the three models

Hint:

# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0

"""
Example: Hint:

Purpose: Demonstrate core concepts and implementation patterns
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

from sklearn.datasets import fetch_openml
import pandas as pd

# Load data
titanic = fetch_openml('titanic', version=1, as_frame=True, parser='auto')
df = titanic.frame

# Preprocessing (missing value handling, categorical encoding, etc.)
# ...

# Train and interpret models
# ...

Summary

In this chapter, we learned the basics of model interpretability:

✅ Importance: Essential for trustworthiness, regulatory compliance, debugging, and bias detection
✅ Classification: Global/local, model-specific/agnostic, intrinsic/post-hoc interpretability
✅ Interpretable Models: Linear regression, decision trees, rule-based, GAM
✅ Interpretation Techniques: Feature Importance, PDP, SHAP, LIME, Saliency Maps
✅ Evaluation Criteria: Fidelity, Consistency, Stability, Comprehensibility

In the next chapter, we will learn in detail about Feature Importance and Permutation Importance.

References

Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/
Lundberg, S. M., & Lee, S. I. (2017). "A Unified Approach to Interpreting Model Predictions." NIPS 2017.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?: Explaining the Predictions of Any Classifier." KDD 2016.
Rudin, C. (2019). "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead." Nature Machine Intelligence.
European Union. (2016). General Data Protection Regulation (GDPR). Article 22.
Doshi-Velez, F., & Kim, B. (2017). "Towards A Rigorous Science of Interpretable Machine Learning." arXiv:1702.08608.

Learning Objectives

1.1 Why Model Interpretability Matters

Trust and Accountability

Regulatory Requirements (GDPR, AI Regulations)

Debugging and Model Improvement

Bias Detection

1.2 Classification of Interpretability

Global Interpretation vs Local Interpretation

Model-Specific vs Model-Agnostic

Intrinsic Interpretability vs Post-hoc Interpretability

Taxonomy of Interpretability

1.3 Interpretable Models

Linear Regression

Decision Trees

Rule-Based Models

GAM (Generalized Additive Models)

1.4 Overview of Interpretation Techniques

Feature Importance

Partial Dependence Plot (PDP)

SHAP (SHapley Additive exPlanations)

LIME (Local Interpretable Model-agnostic Explanations)

Saliency Maps

1.5 Evaluating Interpretability

Fidelity

Consistency

Stability

Comprehensibility

Practice Problems

Summary

References

Disclaimer