🧠 MLP Learning: Batch vs Stochastic Gradient Descent

Understanding how errors are handled during training

📚 Select Learning Mode:

Ready to Start

Select a learning mode and click "Train Network"

1.0x
Learning Mode
SGD
Current Phase
Ready
Average Loss
0.0000
Epochs Completed
0
Weight Updates
0
Input Layer
Hidden Layer
Output Layer
Forward Signal
Backward Gradient

📚 Understanding Error Handling in One Epoch:

Stochastic Gradient Descent (SGD):

• Process Sample 1 → Calculate error → Update weights immediately
• Process Sample 2 → Calculate error → Update weights immediately
• Process Sample 3 → Calculate error → Update weights immediately
• Process Sample 4 → Calculate error → Update weights immediately
Result: 4 weight updates per epoch. Each sample immediately influences the network.

Batch Gradient Descent:

• Process Sample 1 → Calculate error → Store gradients
• Process Sample 2 → Calculate error → Accumulate gradients
• Process Sample 3 → Calculate error → Accumulate gradients
• Process Sample 4 → Calculate error → Accumulate gradients
• Average all gradients → Update weights ONCE
Result: 1 weight update per epoch. All samples influence the update equally.

Key Difference: In SGD, weights change after each sample, so later samples see different weights than earlier ones. In Batch GD, all samples see the same weights, and we update based on the average gradient.

Average Loss Calculation: For both methods, we report the average loss across all samples in the epoch: Avg Loss = (Loss₁ + Loss₂ + Loss₃ + Loss₄) / 4