Understanding how errors are handled during training
Select a learning mode and click "Train Network"
Stochastic Gradient Descent (SGD):
• Process Sample 1 → Calculate error → Update weights immediately
• Process Sample 2 → Calculate error → Update weights immediately
• Process Sample 3 → Calculate error → Update weights immediately
• Process Sample 4 → Calculate error → Update weights immediately
Result: 4 weight updates per epoch. Each sample immediately influences the network.
Batch Gradient Descent:
• Process Sample 1 → Calculate error → Store gradients
• Process Sample 2 → Calculate error → Accumulate gradients
• Process Sample 3 → Calculate error → Accumulate gradients
• Process Sample 4 → Calculate error → Accumulate gradients
• Average all gradients → Update weights ONCE
Result: 1 weight update per epoch. All samples influence the update equally.
Key Difference: In SGD, weights change after each sample, so later samples see different weights than earlier ones. In Batch GD, all samples see the same weights, and we update based on the average gradient.
Average Loss Calculation: For both methods, we report the average loss across all
samples in the epoch: Avg Loss = (Loss₁ + Loss₂ + Loss₃ + Loss₄) / 4