datarekha
Deep Learning Medium Asked at GoogleAsked at MetaAsked at AmazonAsked at NVIDIA

How do you count the number of trainable parameters in a convolutional layer?

The short answer

Each filter has k*k*C_in weights plus one bias, and a layer with C_out filters therefore has (k*k*C_in + 1)*C_out parameters. This count is independent of the input's spatial dimensions H and W, which is what makes CNNs so parameter-efficient.

How to think about it

Interviewers give a concrete layer spec and ask you to compute on the spot. Know the formula cold and practise a few examples.

The formula

A conv layer is defined by kernel size k, input channels C_in, and output channels C_out:

params = (k * k * C_in + 1) * C_out

The +1 is the per-filter bias. If bias=False (common when followed by BatchNorm), drop it:

params = k * k * C_in * C_out

Spatial dimensions H and W do not appear — the same kernel tiles across the whole map.

Worked examples

Example 1 — typical first conv layer

Input 224 x 224 x 3, kernel 7 x 7, 64 filters, bias=True:

(7 * 7 * 3 + 1) * 64 = (147 + 1) * 64 = 9,472

Example 2 — deeper layer

Input 28 x 28 x 128, kernel 3 x 3, 256 filters, bias=False:

3 * 3 * 128 * 256 = 294,912

Example 3 — 1×1 convolution

Input 14 x 14 x 512, kernel 1 x 1, 128 filters, bias=False:

1 * 1 * 512 * 128 = 65,536

Contrast with a dense layer

A fully-connected layer mapping 512 units to 256 units needs 512 * 256 + 256 = 131,328 parameters, and the count scales with both input and output size. A conv layer’s count scales only with kernel area and channel counts — not spatial resolution.

BatchNorm parameters

If a BatchNorm layer follows the conv, it adds 2 * C_out learnable parameters (scale γ and shift β, one pair per channel) plus two non-learned running statistics (mean and variance) that are not updated by backprop.

Learn it properly PyTorch quickstart

Keep practising

All Deep Learning questions

Explore further

Skip to content