What do stride and padding control in a convolutional layer?
Stride sets how many positions the kernel jumps between applications, controlling output resolution — stride 2 roughly halves spatial dimensions. Padding adds values (usually zeros) around the border so the kernel can be applied to edge pixels, letting you choose whether the output shrinks, stays the same, or (rarely) grows relative to the input.
How to think about it
Know the output size formula by heart and be able to derive the padding needed for “same” output size — you’ll be asked to compute it live.
Stride
Stride S is how far the kernel moves after each application.
S=1: dense coverage, output is nearly the same size as inputS=2: skip every other position, output is roughly half the size- Strided convolutions are now preferred over pooling for downsampling in modern architectures (e.g., ResNet uses
stride=2in the first conv of a block instead of a pooling layer)
Padding
Padding P adds a border of values (zero-padding is standard) around the input before convolution.
Valid padding (P=0): no padding. The kernel only covers positions that fit entirely inside the input. Each conv layer shrinks the spatial size by k-1.
Same padding: P = floor(k/2) for odd kernel sizes. Output size equals input size when S=1. This is the standard choice to avoid shrinkage in deep stacks.
Output size formula
H_out = floor((H_in + 2*P - k) / S) + 1
Examples with H_in=28, k=3:
| Padding | Stride | H_out |
|---|---|---|
| 0 (valid) | 1 | 26 |
| 1 (same) | 1 | 28 |
| 1 (same) | 2 | 14 |
| 0 (valid) | 2 | 13 |
Practical design rules
- Use
S=1, P=1(same padding) for feature-extracting blocks that should preserve resolution. - Use
S=2, P=1to halve spatial dimensions at a stage boundary instead of a pooling layer. - Use
S=1, P=0(valid) for1 x 1convolutions — padding is irrelevant for kernel size 1.