datarekha
Deep Learning Medium Asked at GoogleAsked at AmazonAsked at Meta

What is an autoencoder and what is it used for?

The short answer

An autoencoder is a neural network trained to compress input into a low-dimensional bottleneck (encoder) and then reconstruct the original input from that bottleneck (decoder). It learns a compact representation without labels, making it useful for dimensionality reduction, anomaly detection, and as a component of generative models.

How to think about it

The training signal is reconstruction error — typically mean squared error for continuous inputs or binary cross-entropy for binary inputs. No labels are needed, making autoencoders a form of self-supervised learning.

Input x  →  Encoder f  →  Latent z (bottleneck)  →  Decoder g  →  Reconstruction x̂
Loss = ||x - x̂||²
import torch.nn as nn

class Autoencoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(784, 256), nn.ReLU(),
            nn.Linear(256, 32),             # bottleneck: 32-d latent
        )
        self.decoder = nn.Sequential(
            nn.Linear(32, 256), nn.ReLU(),
            nn.Linear(256, 784), nn.Sigmoid(),
        )

    def forward(self, x):
        z = self.encoder(x)
        return self.decoder(z)

Variants and their uses

VariantKey changePrimary use
Vanilla AEBottleneck MSEDimensionality reduction, visualisation
Denoising AEInput corrupted; reconstruct cleanRobust features, data cleaning
Sparse AEL1 penalty on zFeature learning, interpretability
VAEz is a distribution (reparameterisation)Generation, controllable latent space
Convolutional AEConv layers in encoder/decoderImage compression and reconstruction

Anomaly detection

Train an autoencoder on normal data only. At inference, abnormal inputs are harder to reconstruct — the reconstruction error is elevated and serves as an anomaly score. This works well in industrial defect detection and network intrusion.

Relationship to VAEs and diffusion

A standard autoencoder has no constraint on the structure of the latent space, so interpolating between two latent vectors produces garbage. VAEs regularise the latent space to be approximately Gaussian, enabling smooth interpolation and sampling. Diffusion models bypass the bottleneck entirely, operating in the full data space.

Keep practising

All Deep Learning questions

Explore further

Skip to content