datarekha
Deep Learning Easy Asked at GoogleAsked at MetaAsked at Amazon

What do stride and padding control in a convolutional layer?

The short answer

Stride sets how many positions the kernel jumps between applications, controlling output resolution — stride 2 roughly halves spatial dimensions. Padding adds values (usually zeros) around the border so the kernel can be applied to edge pixels, letting you choose whether the output shrinks, stays the same, or (rarely) grows relative to the input.

How to think about it

Know the output size formula by heart and be able to derive the padding needed for “same” output size — you’ll be asked to compute it live.

Stride

Stride S is how far the kernel moves after each application.

  • S=1: dense coverage, output is nearly the same size as input
  • S=2: skip every other position, output is roughly half the size
  • Strided convolutions are now preferred over pooling for downsampling in modern architectures (e.g., ResNet uses stride=2 in the first conv of a block instead of a pooling layer)

Padding

Padding P adds a border of values (zero-padding is standard) around the input before convolution.

Valid padding (P=0): no padding. The kernel only covers positions that fit entirely inside the input. Each conv layer shrinks the spatial size by k-1.

Same padding: P = floor(k/2) for odd kernel sizes. Output size equals input size when S=1. This is the standard choice to avoid shrinkage in deep stacks.

Output size formula

H_out = floor((H_in + 2*P - k) / S) + 1

Examples with H_in=28, k=3:

PaddingStrideH_out
0 (valid)126
1 (same)128
1 (same)214
0 (valid)213

Practical design rules

  • Use S=1, P=1 (same padding) for feature-extracting blocks that should preserve resolution.
  • Use S=2, P=1 to halve spatial dimensions at a stage boundary instead of a pooling layer.
  • Use S=1, P=0 (valid) for 1 x 1 convolutions — padding is irrelevant for kernel size 1.

Keep practising

All Deep Learning questions

Explore further

Skip to content