Deep Learning Medium Asked at GoogleAsked at MetaAsked at NVIDIA

What is a 1x1 convolution and why is it useful?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

A 1x1 convolution applies a learned linear combination across channels at each spatial position, without looking at any spatial neighbourhood. It is used to change the number of channels cheaply, add non-linearity between pointwise operations, and build the bottleneck blocks at the core of Inception and ResNet-50+.

How to think about it

The spatial-vs-channel distinction is the key insight. Once you explain it correctly, show the parameter saving in a bottleneck — that’s what interviewers remember.

What it computes

At every spatial position (i, j), a 1×1 conv computes:

output[i, j, f] = sum over c of input[i, j, c] * W[f, c] + b[f]

It mixes information across the channel dimension only — no neighbourhood, no spatial learning. The result is a learned projection of the channel vector at each pixel.

Parameter count

params = (1 * 1 * C_in + 1) * C_out = (C_in + 1) * C_out

This is dramatically cheaper than a 3×3 conv: for C_in=256, C_out=256, a 3×3 costs (9*256+1)*256 ≈ 590 K parameters; a 1×1 costs only (256+1)*256 ≈ 65 K.

Use cases

Channel bottleneck (ResNet-50, Inception)

Compress channels before an expensive 3×3 conv, then expand again:

256ch → 1x1 → 64ch → 3x3 → 64ch → 1x1 → 256ch

Parameters for the 3×3 drop from (9*256+1)*256 ≈ 590 K to (9*64+1)*64 ≈ 37 K — roughly 16× cheaper.

Projection shortcuts

In ResNet, when a block changes the channel count, a 1×1 conv (with matching stride) aligns the skip connection dimensions before addition.

Pointwise mixing in depthwise-separable convolutions

MobileNet splits a standard conv into: (1) a depthwise conv that processes each channel independently, then (2) a 1×1 pointwise conv that mixes channels. The 1×1 step recovers cross-channel expressiveness at minimal cost.

Non-linearity injection

A 1×1 conv followed by ReLU adds a cheap non-linear transformation, increasing the network’s representational power without spatial cost.

Learn it properly Convolutional neural networks

What is a 1x1 convolution and why is it useful?

What it computes

Parameter count

Use cases

Keep practising

Explore further