In KNN, how do you choose k, and how does the curse of dimensionality affect it?
Choose k by cross-validation, balancing a small k (low bias, high variance, noisy) against a large k (smoother, higher bias); an odd k avoids ties in binary classification. The curse of dimensionality hurts KNN because in high dimensions all points become nearly equidistant, so 'nearest' loses meaning and accuracy degrades. Feature scaling and dimensionality reduction help.
How to think about it
The crisp answer
Pick k by cross-validation: try a range and choose the value minimizing validation error. Small k gives a flexible, low-bias but high-variance model that chases noise; large k smooths predictions, raising bias but lowering variance. For binary classification use an odd k to avoid tie votes.
Why k is a bias-variance dial
k = 1 memorizes the training set — perfect train accuracy, fragile on test data. As k grows the decision boundary smooths until, at k = N, every point gets the majority class. The Analytics Vidhya KNN question set treats this as the central KNN tuning question.
The curse of dimensionality
KNN relies on a meaningful notion of “close.” In high dimensions, volume grows so fast that data becomes sparse and the distance to the nearest and farthest neighbors converges — everything is roughly equidistant. “Nearest” stops being informative, so accuracy collapses and you’d need exponentially more data to keep neighborhoods dense.
Concrete mitigations
- Scale features (KNN is distance-based; unscaled features dominate).
- Reduce dimensions with PCA or feature selection before applying KNN.
- Use distance weighting so closer neighbors count more.
The common trap
Forgetting that KNN is a lazy learner: no training cost, but every prediction scans the dataset, making inference slow and memory-heavy at scale (mitigate with KD-trees, ball trees, or approximate nearest neighbors). Also forgetting scaling entirely. Follow-up: “What happens at k = 1 vs k = N?” — k=1 overfits, k=N predicts the global majority and underfits.