The Vanishing Gradient Problem

In deep neural networks, gradients are multiplied at each layer during the backpropagation process. Without skip connections, this leads to exponentially decreasing gradients – the lower layers barely learn. Residual connections create a "Gradient Highway" that enables direct gradient flow.

Controls

8
5

Gradient Strength

Strong (≥ 0.7)
Medium (0.3 - 0.7)
Weak (0.1 - 0.3)
Vanishing (< 0.1)

Without Residual Connections

Gradients vanish exponentially

With Residual Connections

Gradient Highway enables direct flow

Fig. 1 | Side-by-side comparison of gradient flow during backpropagation. Left: Traditional deep network shows severe vanishing gradient problem. Right: Residual network with skip connections maintains strong gradient flow through all layers.

Why Gradients Vanish

In traditional backpropagation, gradients are multiplied at each layer:

  • Gradient passes through weights and activation functions
  • In deep networks: many multiplications with values < 1
  • Exponential decay: 0.8^10 ≈ 0.107
  • Lower layers receive barely any learning signal

The Solution: Residual Connections

Skip connections create a "Gradient Highway":

  • Direct path for gradients through all layers
  • Formula: H(x) = F(x) + x instead of just F(x)
  • Gradient can pass skip connection unchanged
  • Enables training of networks with 100+ layers