Support Vector Machines
Find the separating line with the widest margin. The margin is 2/‖w‖, only the support vectors define it, and kernels bend it into curves — all GATE-frequent.
What you'll learn
- SVM picks the max-margin hyperplane wᵀx + b = 0
- The margin width equals 2/‖w‖, so maximising margin means minimising ‖w‖
- Only the closest points — support vectors — determine the boundary
- Soft margins handle overlap; kernels fit curved boundaries conceptually
Before you start
Many lines can separate two classes — logistic regression settles on one of them.
A Support Vector Machine (SVM) is pickier: among all separating lines it chooses
the one that sits as far as possible from both classes, leaving the widest
margin. That single choice gives SVMs their robustness — one reason they were the
go-to classifier for text and image tasks before deep learning, and still a strong
baseline on small, high-dimensional datasets. It leads to two facts GATE leans on
every couple of years: the margin equals 2/‖w‖, and only a handful of points
actually matter.
The maximum-margin hyperplane
The decision boundary is a hyperplane wᵀx + b = 0. The margin is the empty band
around it, running out to the nearest point of each class; its full width works out to
margin = 2 / ‖w‖
SVM maximises that width. Because the 2 is fixed, maximising the margin is the
same as minimising ‖w‖ — a small weight norm means a wide margin.
Support vectors — only the closest points matter
The points sitting exactly on the margin edges are the support vectors. They alone pin down the boundary. Every other point lies safely outside the band and contributes nothing: delete a non-support-vector and the boundary and margin do not move at all. This is why SVMs are described by “support” vectors — the solution depends on a small subset of the data, not all of it.
Soft margins and kernels
Two extensions, conceptual only at GATE level:
- Soft margin. Real data overlaps, so a hard margin (zero violations) may be impossible. The soft-margin SVM adds slack that lets a few points sit inside the margin or on the wrong side, trading a little misclassification for a wider, more robust boundary.
- Kernels. A linear SVM can only draw straight boundaries. A kernel (e.g. polynomial or RBF) implicitly maps the data into a higher-dimensional space where a straight boundary there corresponds to a curved boundary back in the original space — all without ever computing the mapping explicitly (the “kernel trick”). This is how SVMs fit non-linear class shapes.
How GATE asks this
Usually a conceptual MCQ/MSQ on the margin and support vectors, sometimes with a
tiny hard-margin calculation. Two patterns recur: (1) “given ‖w‖, what is the
margin?” — plug into 2/‖w‖; and (2) “what happens to the boundary if a
non-support-vector is removed / a support vector is removed?” Both appeared in 2024
and 2025. No dual formulation or KKT conditions are required.
Worked example — support-vector invariance and the margin
SVM1 is trained on a dataset. SVM2 is trained on the same data but with one point — a non-support-vector — removed. Also, you are told the learned weight vector has
‖w‖ = 2. Compare the two margins, and give the margin width.
Reason it out in two steps:
-
Does the margin change? The boundary and margin of an SVM are determined only by the support vectors. The removed point is not a support vector, so it never touched the margin and never constrained the solution. Removing it leaves the support vectors — and therefore
w,b, the boundary, and the margin — unchanged. SVM2’s margin equals SVM1’s margin. -
What is the width? Plug the given norm into the formula:
margin = 2 / ‖w‖ = 2 / 2 = 1.
So the margin is unchanged by deleting the non-support-vector, and its width is
1. (GATE DA tested support vectors directly in 2024 — “which set is a possible set
of support vectors?” — and the margin again in 2025.) Sanity check: if instead
‖w‖ = 4, the margin would be 2/4 = 0.5 — a larger weight norm always means a
narrower margin.
Quick check
Quick check
Practice this in an interview
All questionsAn SVM finds the hyperplane that maximises the margin between the two nearest points of each class (the support vectors). When data is not linearly separable, the kernel trick implicitly maps inputs to a high-dimensional feature space — computing inner products there without ever materialising the transformation — enabling non-linear decision boundaries at the cost of linear-space computation.
C is the regularisation parameter that trades margin width against training error tolerance. A small C allows many margin violations (wide margin, simpler boundary, higher bias) while a large C penalises violations heavily, forcing a narrow margin that fits the training data more tightly but risks overfitting.
Right-skewed features (long tail on the right) concentrate most values near zero while a few extreme values pull the mean up, which distorts distance-based models and linear regression. Common fixes are log, square-root, or Box-Cox transformations that compress the tail and make the distribution closer to normal, improving model convergence and reducing the undue influence of large values.
OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.