Interview prep
NLP & LLMs interview questions
37 of the most common NLP & LLMs questions for data and AI interviews — each with a worked answer, the trap to avoid, and a link to learn it properly. Embeddings, attention, RAG, prompting, evaluation.
Filter by role
- What is chain-of-thought prompting and when does it help? Easy
- What is a context window in an LLM and why does its size matter? Easy ·OpenAI·Anthropic·Google
- Why is cosine similarity preferred over Euclidean distance for comparing text vectors? Easy ·Google·Amazon·Microsoft
- How does an LLM generate text — what is next-token prediction and autoregression? Easy ·OpenAI·Anthropic·Google
- What are n-grams and when should you use them in NLP? Easy ·Google·Amazon
- What prompt engineering techniques should every LLM practitioner know? Easy
- What is sequence padding and why is it necessary for batch training? Easy ·Google·Amazon·Meta
- What is the difference between stemming and lemmatization? Easy ·Amazon·Microsoft
- What are stop words and when should you remove them? Easy ·Amazon·Microsoft
- How do you reliably get structured outputs (JSON, typed objects) from an LLM? Easy ·OpenAI·Anthropic·Google
- What is TF-IDF and how does it improve on raw bag-of-words counts? Easy ·Google·Amazon·Microsoft
- What are tokens in an LLM and why is API pricing per token rather than per word or character? Easy ·OpenAI·Anthropic·Google
- What do 'parameters' mean in a language model and what do they actually store? Easy ·OpenAI·Google·Meta
- What is a vector database and how does it enable semantic retrieval? Easy
- What is Retrieval-Augmented Generation (RAG) and why is it used? Easy ·OpenAI·Google·Microsoft
- What is tokenization in NLP and why does it matter? Easy ·Google·Amazon·Meta
- Why do dense word embeddings outperform one-hot vectors? Easy ·Google·Meta·Amazon
- What is the difference between encoder models like BERT and decoder models like GPT? Medium ·Google·OpenAI·Meta
- How does Byte-Pair Encoding (BPE) tokenization work? Medium ·OpenAI·Google·Meta
- What chunking strategies exist for RAG and how do you choose between them? Medium
- How do you evaluate the quality of an LLM or RAG system? Medium ·Cohere·Databricks·Anthropic
- When should you use prompt engineering versus fine-tuning to adapt an LLM? Medium ·OpenAI·Anthropic·Google
- How does GloVe differ from Word2Vec in learning word embeddings? Medium ·Google·Stanford·Meta
- What are out-of-vocabulary (OOV) words and how do modern NLP systems handle them? Medium ·Google·Meta·Amazon
- What is the difference between pretraining, fine-tuning, instruction-tuning, and RLHF? Medium ·OpenAI·Anthropic·Google
- What is prompt injection and how do you defend against it? Medium ·Anthropic·OpenAI·Microsoft
- What techniques reduce LLM cost and latency in production? Medium ·OpenAI·Anthropic·Google
- What is hybrid search and when should you use semantic vs keyword retrieval? Medium
- What is the difference between temperature, top-k, and top-p sampling in LLMs? Medium ·OpenAI·Anthropic·Google
- How do function/tool calling and LLM agents work at a high level? Medium ·OpenAI·Anthropic·Google
- What causes LLM hallucinations and how can they be reduced? Medium ·OpenAI·Anthropic·Google
- How does Word2Vec work, and what is the difference between Skip-gram and CBOW? Medium ·Google·Meta·Amazon
- Why are smaller language models (SLMs) sometimes preferable to larger ones? Medium ·Google·Meta·Anthropic
- What is the key difference between Word2Vec embeddings and BERT's contextual embeddings? Medium ·Google·Meta·OpenAI
- What is the KV cache in a transformer and why does it matter for inference? Hard ·OpenAI·Anthropic·Google
- When should you use RAG vs fine-tuning vs a long-context model? Hard ·OpenAI·Anthropic·Google
- How does RLHF work and what problem does it solve? Hard ·OpenAI·Anthropic·Google
No questions tagged for that role yet.