Deep Learning Easy Asked at GoogleAsked at MetaAsked at Amazon

What are filters and feature maps in a CNN, and what do they represent?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

A filter (kernel) is the set of learned weights that the network applies at each spatial position; a feature map is the spatial grid of responses produced when one filter slides over the input. Each filter detects one type of pattern, and the full stack of feature maps across all filters constitutes the layer's output representation.

How to think about it

This is often the entry point question before deeper CNN probes. Be precise about shapes and what each dimension represents, then connect to visualisation work that shows what filters actually learn.

Filters (kernels)

A filter is a 3-D tensor of shape k x k x C_in. Each element is a learned weight; the filter scans the input by computing a dot product at every spatial position. A conv layer has C_out such filters — one per output channel.

Filter shape: k x k x C_in — spatial extent times depth of input
Number of filters: C_out — one per output channel
Total weight tensor: k x k x C_in x C_out

After training, filters in early layers typically resemble Gabor-like edge detectors (oriented bars at different frequencies). Deeper filters respond to complex compositions: eyes, wheels, fur textures.

Feature maps

When a single filter slides over the input, it produces a 2-D grid of scalars — one number per spatial position. This grid is the feature map for that filter. Its value at position (i, j) measures how strongly that filter’s pattern is present at location (i, j) in the input.

Feature map shape for one filter: H_out x W_out
Full layer output: H_out x W_out x C_out — a stack of C_out feature maps

Visualising filters

The Zeiler & Fergus (2014) deconvnet visualisation showed that:

Layer 1 filters detect oriented edges and colour blobs
Layer 2 detects corners and simple textures
Layer 3+ detects increasingly complex and class-specific patterns

Relationship to channels

Input channels and output channels have different roles:

Input channels (C_in): each filter covers all input channels simultaneously — colour, or prior-layer features
Output channels (C_out): each filter produces one feature map; all C_out maps together form the new representation

Learn it properly Convolutional neural networks