What is an autoencoder and what is it used for?
An autoencoder is a neural network trained to compress input into a low-dimensional bottleneck (encoder) and then reconstruct the original input from that bottleneck (decoder). It learns a compact representation without labels, making it useful for dimensionality reduction, anomaly detection, and as a component of generative models.
How to think about it
The training signal is reconstruction error — typically mean squared error for continuous inputs or binary cross-entropy for binary inputs. No labels are needed, making autoencoders a form of self-supervised learning.
Input x → Encoder f → Latent z (bottleneck) → Decoder g → Reconstruction x̂
Loss = ||x - x̂||²
import torch.nn as nn
class Autoencoder(nn.Module):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(784, 256), nn.ReLU(),
nn.Linear(256, 32), # bottleneck: 32-d latent
)
self.decoder = nn.Sequential(
nn.Linear(32, 256), nn.ReLU(),
nn.Linear(256, 784), nn.Sigmoid(),
)
def forward(self, x):
z = self.encoder(x)
return self.decoder(z)
Variants and their uses
| Variant | Key change | Primary use |
|---|---|---|
| Vanilla AE | Bottleneck MSE | Dimensionality reduction, visualisation |
| Denoising AE | Input corrupted; reconstruct clean | Robust features, data cleaning |
| Sparse AE | L1 penalty on z | Feature learning, interpretability |
| VAE | z is a distribution (reparameterisation) | Generation, controllable latent space |
| Convolutional AE | Conv layers in encoder/decoder | Image compression and reconstruction |
Anomaly detection
Train an autoencoder on normal data only. At inference, abnormal inputs are harder to reconstruct — the reconstruction error is elevated and serves as an anomaly score. This works well in industrial defect detection and network intrusion.
Relationship to VAEs and diffusion
A standard autoencoder has no constraint on the structure of the latent space, so interpolating between two latent vectors produces garbage. VAEs regularise the latent space to be approximately Gaussian, enabling smooth interpolation and sampling. Diffusion models bypass the bottleneck entirely, operating in the full data space.