Overfitting Vs Underfitting – Explained With Visuals & Python Code

When we build machine learning models, we want them to do one thing well: make accurate predictions on new, unseen data.

But there are two common pitfalls that ruin that goal:

❌ Overfitting – your model learns the training data too well, including its noise and quirks.
❌ Underfitting – your model is too simple to capture the underlying pattern in the data.

In this guide, I’ll break down both concepts not with confusing math, but with visual explanations, relatable analogies, and real-world code examples in Python.

By the end, you’ll understand:

What overfitting and underfitting look like in real datasets
Why these problems occur
How to detect them using learning curves and validation scores
Practical ways to fix them

What Are Overfitting and Underfitting?

Imagine you’re preparing for an exam:

If you memorize every question from a practice test, but don’t understand the concepts you’ll fail if the questions change. That’s overfitting.
If you just skim the book and barely learn anything you’ll fail because you don’t know enough. That’s underfitting.

ML models behave the same way.

Definitions and Key Differences

Aspect	Overfitting	Underfitting
Model Behavior	Too complex, learns noise	Too simple, misses patterns
Training Accuracy	Very high	Low
Test Accuracy	Poor	Poor
Generalization	Weak	Weak
Fix Strategy	Simplify, regularize	Add complexity, better features

Real-Life Analogy

Let’s say you’re learning to drive:

Underfitting: You only read the rules but never practice you fail even in familiar settings.
Overfitting: You only drive on your home street and memorize every bump the moment you go to a new area, you panic.

A good driver (or ML model) should generalize to new roads (data).

Visual Explanation with Python Code

Let’s simulate this using Scikit-learn:

Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split

# Generate fake data
np.random.seed(0)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Try different model complexities
degrees = [1, 4, 15]

plt.figure(figsize=(18, 4))

for i, d in enumerate(degrees):
    model = make_pipeline(PolynomialFeatures(d), LinearRegression())
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    plt.subplot(1, 3, i+1)
    plt.scatter(X, y, color='gray', label="Actual data")
    plt.plot(X, model.predict(X), label=f"Degree {d}", linewidth=2)
    plt.title(f"Model with Degree {d}")
    plt.legend()

plt.show()

Interpretation:

Degree 1: underfits (can’t capture sine wave)
Degree 15: overfits (wavy, unnatural fit)
Degree 4: best generalization

Learning Curves – How to Detect Overfitting/Underfitting

Overfitting vs Underfitting learning curves example — How to Detect Overfitting and Underfitting with Learning Curves

Plot training and validation scores over increasing dataset sizes:

Python

from sklearn.model_selection import learning_curve
from sklearn.metrics import mean_squared_error

train_sizes, train_scores, val_scores = learning_curve(
    make_pipeline(PolynomialFeatures(15), LinearRegression()),
    X, y, cv=5, scoring='neg_mean_squared_error',
    train_sizes=np.linspace(0.1, 1.0, 5)
)

train_mean = -np.mean(train_scores, axis=1)
val_mean = -np.mean(val_scores, axis=1)

plt.plot(train_sizes, train_mean, 'o-', label='Training Error')
plt.plot(train_sizes, val_mean, 'o-', label='Validation Error')
plt.title("Learning Curve (Overfitting Example)")
plt.xlabel("Training Set Size")
plt.ylabel("Mean Squared Error")
plt.legend()
plt.grid()
plt.show()

What to Look For:

Overfitting: Large gap between training and validation error.
Underfitting: Both errors are high and close together.

How to Fix Overfitting

Simplify the model
Use fewer features or reduce polynomial degree.
Regularization
Add penalties to large weights. Use Ridge, Lasso, or ElasticNet.
More data
The more diverse data you feed, the better the model learns general patterns.
Early stopping
For iterative algorithms (like neural networks), stop before it overfits.

How to Fix Underfitting

Add more features
If your model is too simple, give it richer information.
Increase model complexity
Switch from linear to polynomial, or try a more flexible algorithm.
Decrease regularization
Too much penalty? It can make the model too constrained.

Real-World Example: Predicting Housing Prices

Let’s compare two models on a dataset:

Python

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import r2_score

data = fetch_california_housing()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y)

model_under = LinearRegression()
model_under.fit(X_train[:, :2], y_train)  # only 2 features

model_over = Ridge(alpha=0.1)
model_over.fit(X_train, y_train)

print("Underfit R²:", r2_score(y_test, model_under.predict(X_test[:, :2])))
print("Overfit R²:", r2_score(y_test, model_over.predict(X_test)))

Result:

Using too few features → underfitting
Too low regularization (or too many features) → overfitting

Common Mistakes to Avoid

Mistake	Fix
Using test data for tuning	Always separate test/validation data
Ignoring validation performance	Monitor both train and val metrics
Blindly increasing model depth	More isn’t always better
No regularization	Use L2/L1 to avoid overfitting

Overfitting vs Underfitting in Machine Learning – Complete Guide with Real Examples

Table of Contents

What Are Overfitting and Underfitting?

Definitions and Key Differences

Real-Life Analogy

Visual Explanation with Python Code

Interpretation:

Learning Curves – How to Detect Overfitting/Underfitting

What to Look For:

How to Fix Overfitting

How to Fix Underfitting

Real-World Example: Predicting Housing Prices

Result:

Common Mistakes to Avoid

External Resources

Read More On This Topic

💌 Stay Updated with PyUniverse

Leave a Comment Cancel reply