Interview prep
Deep Learning interview questions
63 of the most common Deep Learning questions for data and AI interviews — each with a worked answer, the trap to avoid, and a link to learn it properly. Neural nets, backprop, optimization, transformers.
Filter by role
- What is data augmentation in computer vision and which techniques are most effective? Easy ·Google·Meta·NVIDIA
- When should you use deep learning vs classical machine learning? Easy ·Google·Amazon·Netflix
- How does dropout work, and why must it behave differently during training and inference? Easy ·Google·Meta·Amazon
- What is early stopping, and how does it prevent overfitting? Easy ·Google·Amazon·Microsoft
- What is the difference between an epoch, an iteration, and a step in deep learning training? Easy
- What are filters and feature maps in a CNN, and what do they represent? Easy ·Google·Meta·Amazon
- Walk me through the forward pass of a neural network end-to-end. Easy ·Amazon·Microsoft·Apple
- What is gradient clipping, and when is it necessary? Easy ·Google·OpenAI·Meta
- What is L2 regularisation (weight decay), and how does it reduce overfitting? Easy ·Google·Amazon·Apple
- What is pooling, and when would you choose max pooling over average pooling? Easy ·Google·NVIDIA·Meta
- Why does training loss keep falling while validation loss rises? Easy ·Google·Meta·Amazon
- What do the query, key, and value vectors represent in attention? Easy ·Google·OpenAI·Meta
- Why do we scale by sqrt(d_k) in scaled dot-product attention? Easy ·Google·OpenAI·Meta
- What does softmax do, and why is it used in the output layer? Easy ·Google·Microsoft·Apple
- What do stride and padding control in a convolutional layer? Easy ·Google·Meta·Amazon
- What are embeddings and why are they central to modern deep learning? Easy ·Google·OpenAI·Meta
- What does a single artificial neuron (perceptron) actually compute? Easy ·Google·Meta·Amazon
- What does a convolution operation do in a CNN? Easy ·Google·Meta·NVIDIA
- Why do neural networks need activation functions at all? Easy ·Google·OpenAI·Microsoft
- Why do CNNs outperform fully-connected networks on image data? Easy ·Google·Meta·Amazon
- Why are GPUs used for deep learning instead of CPUs? Easy ·NVIDIA·Google·Meta
- Why does a transformer need positional encoding? Easy ·Google·OpenAI·Meta
- What is an autoencoder and what is it used for? Medium ·Google·Amazon·Meta
- What is batch normalisation, and why does it help training? Medium ·Google·Meta·Apple
- How does batch size affect training — speed, convergence, and generalisation? Medium ·Google·Meta·NVIDIA
- How do you handle severe class imbalance when training a deep learning model? Medium ·Stripe·Google·Amazon
- How do you count the number of trainable parameters in a convolutional layer? Medium ·Google·Meta·Amazon
- Why use cross-entropy loss instead of MSE for classification? Medium ·Google·Meta·Amazon
- What is the dying ReLU problem and how do you prevent it? Medium ·Meta·NVIDIA·Google
- What is the difference between encoder-only, decoder-only, and encoder-decoder transformer architectures? Medium ·Google·OpenAI·Meta
- What causes exploding gradients and how is gradient clipping a fix? Medium ·Google·Meta·OpenAI
- What is GELU and why does it outperform ReLU in transformer models? Medium ·Google·OpenAI·NVIDIA
- What is gradient accumulation and when do you need it? Medium ·Google·Meta·Hugging Face
- How do you train a deep learning model when you have very little labelled data? Medium ·Google·Meta·Apple
- What is a learning rate schedule, and why is warmup important? Medium ·Google·Meta·OpenAI
- How do LSTM gates solve the vanishing gradient problem? Medium ·Google·Meta·Amazon
- What is mixed precision training and why does it matter? Medium ·NVIDIA·Google·Meta
- Why use multiple attention heads instead of one large attention operation? Medium ·Google·OpenAI·Meta
- What is a 1x1 convolution and why is it useful? Medium ·Google·Meta·NVIDIA
- What causes overfitting in deep neural networks and how do you fight it? Medium ·Google·Meta·Amazon
- What is the receptive field of a neuron in a CNN and how does it grow with depth? Medium ·Google·Meta·NVIDIA
- Compare sigmoid, tanh, ReLU, leaky ReLU, and GELU — when would you pick each? Medium ·Google·Meta·OpenAI
- What roles do residual connections and layer normalisation play in transformer training? Medium ·Google·OpenAI·Meta
- Describe the ResNet architecture and explain the key design choices that made it work. Medium ·Google·Meta·NVIDIA
- What are the concrete reasons transformers outperform RNNs on most sequence tasks? Medium ·Google·OpenAI·Meta
- How do SGD, SGD with momentum, and RMSProp differ, and what does each one fix? Medium ·Google·DeepMind·Amazon
- Why does sigmoid saturation cause vanishing gradients, and why is tanh only a partial fix? Medium ·Google·Amazon·Meta
- What are skip connections in ResNet and why were they necessary? Medium ·Google·Meta·NVIDIA
- How does transfer learning work for computer vision tasks? Medium ·Google·Meta·NVIDIA
- Walk me through the transformer encoder architecture block by block. Medium ·Google·OpenAI·Meta
- What is the vanishing gradient problem and how do you fix it? Medium ·Google·Meta·OpenAI
- Why does weight initialization matter and how do Xavier and He initialization work? Medium ·Google·NVIDIA·Microsoft
- What does the Adam optimizer do, and what problem does it solve over SGD? Medium ·Google·Meta·OpenAI
- What is backpropagation and how does the chain rule make it work? Medium ·Google·Meta·OpenAI
- What does self-attention actually compute, and why is it useful? Medium ·Google·OpenAI·Meta
- What is transfer learning and when should you use full fine-tuning vs feature extraction? Medium ·Google·OpenAI·Meta
- Why do vanilla RNNs struggle with long sequences? Medium ·Google·Meta·Amazon
- Why is standard self-attention O(n^2) in sequence length, and how is it addressed? Hard ·Google·OpenAI·Meta
- Your model's training loss isn't dropping at all. How do you systematically debug it? Hard ·Google·Meta·OpenAI
- Why does depth help more than width for learning complex functions? Hard ·Google·Meta·OpenAI
- What are the high-level differences between GANs, VAEs, and diffusion models? Hard ·OpenAI·Google·Stability AI
- How does LoRA work and why is it preferred over full fine-tuning for large models? Hard ·Microsoft·Meta·Hugging Face
- What does the Universal Approximation Theorem guarantee — and what doesn't it guarantee? Hard ·Google·OpenAI·DeepMind
No questions tagged for that role yet.