Interview prep
Machine Learning interview questions
80 of the most common Machine Learning questions for data and AI interviews — each with a worked answer, the trap to avoid, and a link to learn it properly. Models, evaluation, regularization, the bias–variance tradeoff.
Filter by role
- What is the accuracy paradox and how does it expose the failure of accuracy as a metric? Easy ·Google·Amazon·Microsoft
- What is the difference between classification and regression, and how do you choose between them? Easy ·Amazon·Microsoft·Apple
- What is a confusion matrix and what four quantities does it report? Easy ·Google·Amazon·Meta
- How do you extract useful features from datetime columns for a machine learning model? Easy ·Amazon·Uber·Airbnb
- What is the difference between Gini impurity and entropy as splitting criteria in decision trees? Easy ·Google·Amazon·Meta
- How do you handle skewed features in a machine learning dataset, and why does skew matter? Easy ·Amazon·Flipkart·Walmart
- How does k-means clustering work? Easy ·Google·Amazon·Meta
- What is information gain and how does it relate to entropy in a decision tree split? Easy ·Google·Amazon·Microsoft
- How does k-nearest neighbours work, and why is it called a lazy learner? Easy ·Amazon·Google
- How does Naive Bayes work, and why is it called 'naive'? Easy ·Google·Amazon
- When should you use one-hot encoding versus label encoding for categorical features? Easy ·Amazon·Google·Flipkart
- What is the out-of-bag error in a random forest and how reliable is it as a validation estimate? Easy ·Amazon·Apple·Microsoft
- What are overfitting and underfitting, and how do you fix each? Easy ·Google·Meta·Netflix
- Why does R-squared always increase when you add features, and when should you use adjusted R-squared instead? Easy ·Amazon·Walmart·Capital One
- What is the difference between standardization and normalization, and which models require feature scaling? Easy ·Google·Amazon·Meta
- What is the difference between supervised, unsupervised, and reinforcement learning? Easy ·Google·Amazon·Meta
- Why do we split data into train, validation, and test sets, and what are the typical proportions? Easy ·Google·Amazon·Meta
- Why is linear regression unsuitable for binary classification, and what specific problems does logistic regression fix? Easy ·Google·Meta·Microsoft
- What is the difference between bagging and boosting, and what error component does each primarily reduce? Medium ·Google·Amazon·Meta
- What is the bias–variance tradeoff? Medium ·Google·Amazon·Microsoft
- How do you choose the number of clusters k in k-means? Medium ·Amazon·Airbnb·Uber
- What is the curse of dimensionality, and how does it affect machine learning models? Medium ·Google·Facebook·Microsoft
- What is data leakage in machine learning, and what are the most common ways it occurs? Medium ·Google·Amazon·Meta
- What is pruning in decision trees and when would you use pre-pruning versus post-pruning? Medium ·Google·Uber
- Walk me through exactly how a decision tree chooses a split at each node. Medium ·Amazon·Microsoft·Apple
- What is the difference between discriminative and generative models, and when would you prefer each? Medium ·Google·OpenAI·Meta
- How does early stopping work in gradient boosting, and why is it necessary? Medium ·Google·Amazon·Meta
- What problem does ElasticNet solve that neither Lasso nor Ridge can handle alone? Medium ·Netflix·Airbnb·Two Sigma
- What is the F1 score, why use the harmonic mean, and when is it the wrong metric? Medium ·Google·Amazon·Microsoft
- Why does regularization require feature scaling, and what happens if you skip it? Medium ·Amazon·Microsoft·Airbnb
- What are filter, wrapper, and embedded feature selection methods, and when do you use each? Medium ·Google·Amazon·Microsoft
- Explain how gradient boosting fits residuals. What role does the learning rate play? Medium ·Google·Amazon·Meta
- When should you use gradient descent over the normal equation to fit a linear regression? Medium ·Google·Amazon·Apple
- When should you use grid search vs random search vs Bayesian optimisation for hyperparameter tuning? Medium ·Google·Meta·Amazon
- How do decision trees and gradient boosting libraries handle categorical features natively, and when is label encoding safe? Medium ·Google·Uber·Airbnb
- How do you handle class imbalance in a machine-learning model? Medium ·Google·Amazon·Meta
- What are the strategies for handling missing values in a machine learning pipeline, and how do you choose between them? Medium ·Google·Amazon·Microsoft
- How do hierarchical clustering and DBSCAN differ from k-means? Medium ·Amazon·Palantir
- How do you handle high-cardinality categorical features in machine learning? Medium ·Airbnb·Uber·Amazon
- How does a random forest work, and why does feature sampling at each split help more than row sampling alone? Medium ·Google·Amazon·Meta
- What is k-means++ and why is it better than random initialisation? Medium ·Google·Amazon
- What are the main limitations of k-means clustering? Medium ·Google·Netflix·Meta
- How does the curse of dimensionality affect KNN? Medium ·Google·Meta·Microsoft
- What is the fundamental difference between L1 (Lasso) and L2 (Ridge) regularization, and when do you choose each? Medium ·Google·Amazon·Meta
- What are the core assumptions of linear regression, and what breaks when each is violated? Medium ·Google·Amazon·Meta
- What is log loss and why does it penalise confident wrong predictions more than uncertain ones? Medium ·Google·Meta·Amazon
- What are the key regression metrics — MAE, RMSE, MAPE, R² — and what are their failure modes? Medium ·Amazon·Google·Uber
- What is multicollinearity, how does it harm regression, and how do you detect and fix it? Medium ·McKinsey·Airbnb·Goldman Sachs
- How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution? Medium ·Google·Jane Street·Two Sigma
- How do you detect and handle outliers in a machine learning dataset? Medium ·Amazon·Goldman Sachs·Walmart
- What is the difference between parametric and non-parametric models? Medium ·Google·Amazon·Apple
- When should you optimize precision and when should you optimize recall? Medium ·Google·Amazon·Meta
- When would you choose a random forest over gradient boosting (XGBoost/LightGBM), and vice versa? Medium ·Google·Amazon·Meta
- How do you select the regularization strength λ, and what does it mean to set it too high or too low? Medium ·Amazon·Apple·Spotify
- When should you use RMSE versus MAE for regression evaluation, and what does R-squared actually tell you? Medium ·Google·Amazon·Airbnb
- What is the ROC curve and what does AUC actually measure? Medium ·Google·Amazon·Meta
- What is the difference between SHAP and LIME for model interpretability? Medium ·Airbnb·Uber·Google
- Explain the relationship between the sigmoid function, odds, and log-odds in logistic regression. Medium ·Amazon·Stripe·Uber
- What is stratified k-fold cross-validation and when is it necessary? Medium ·Meta·LinkedIn·Stripe
- What does the C parameter control in a Support Vector Machine? Medium ·Amazon·Bloomberg·Microsoft
- What is target encoding, when is it better than one-hot encoding, and how does it cause data leakage? Medium ·Amazon·Airbnb·Uber
- What are the main approaches for converting raw text into features for a machine learning model? Medium ·Google·Amazon·Microsoft
- How do you choose the optimal decision threshold for a binary classifier? Medium ·Stripe·Google·Amazon
- Why can't you use standard k-fold cross-validation on time-series data, and what should you use instead? Medium ·Uber·Airbnb·Bloomberg
- What are t-SNE and UMAP, how do they differ from PCA, and what are their limitations for ML workflows? Medium ·Google·DeepMind·Meta
- What is k-fold cross-validation and when should you use it over a single train/validation split? Medium ·Google·Airbnb·Spotify
- What is generalization in machine learning, and what factors determine how well a model generalizes? Medium ·Google·DeepMind·Amazon
- What is PCA, when should you use it, and what are its key limitations? Medium ·Google·Amazon·Microsoft
- What is feature leakage and how do you prevent it during feature engineering and preprocessing? Hard ·Google·Amazon·Meta
- What are the pitfalls of impurity-based feature importance in tree ensembles, and how do you get a more reliable estimate? Hard ·Google·Meta·Uber
- What loss function does logistic regression optimize, and why is it convex? Hard ·Google·DeepMind·Jane Street
- What are MAP and NDCG, and when would you use each for evaluating a ranking system? Hard ·Google·Meta·Amazon
- What is model calibration and how do you measure and fix a poorly calibrated classifier? Hard ·Google·Meta·Stripe
- Your model performs well offline but degrades in production. How do you diagnose and fix it? Hard ·Google·Meta·Airbnb
- What does the No Free Lunch theorem state, and what are its practical implications for choosing algorithms? Hard
- What is the Precision-Recall curve, and why does it outperform ROC-AUC on imbalanced datasets? Hard ·Google·Meta·Stripe
- What is the Bayesian interpretation of Ridge regression, and what prior does it correspond to? Hard ·Google·DeepMind·Two Sigma
- How does an SVM work, and what is the kernel trick? Hard ·Google·Microsoft·Amazon
- What are the key algorithmic differences between XGBoost and LightGBM? Hard ·Google·Meta·Uber
- What regularisation mechanisms does XGBoost add on top of standard gradient boosting? Hard ·Google·Meta·Uber
No questions tagged for that role yet.