How does an SVM work, and what is the kernel trick?

An SVM finds the hyperplane that maximises the margin between the two nearest points of each class (the support vectors). When data is not linearly separable, the kernel trick implicitly maps inputs to a high-dimensional feature space — computing inner products there without ever materialising the transformation — enabling non-linear decision boundaries at the cost of linear-space computation.

What does the C parameter control in a Support Vector Machine?

C is the regularisation parameter that trades margin width against training error tolerance. A small C allows many margin violations (wide margin, simpler boundary, higher bias) while a large C penalises violations heavily, forcing a narrow margin that fits the training data more tightly but risks overfitting.

What is the kernel trick in SVM, and why does it work?

The kernel trick lets an SVM find a nonlinear decision boundary by implicitly mapping data into a higher-dimensional space where it becomes linearly separable, without ever computing that mapping explicitly. It works because the SVM's dual formulation depends only on dot products between points, and a kernel function computes that dot product directly in the high-dimensional space. Common kernels are linear, polynomial, and RBF.

What do the C and gamma hyperparameters control in an SVM, and how do they relate to overfitting?

C controls the soft-margin tradeoff: large C penalizes misclassifications heavily, producing a narrow margin that can overfit, while small C allows more slack for better generalization. Gamma (for RBF kernels) sets how far one training point's influence reaches: high gamma makes a wiggly boundary that overfits, low gamma makes it smoother. You tune both jointly via cross-validation after scaling features.

Support Vector Machines — GATE DA

Last lesson left a pencil of perfect separators — infinitely many lines, each splitting two clean classes with zero errors, and no reason yet to prefer one over another. A Support Vector Machine (SVM) ends the indecision with a single, sharp principle: of all those lines, choose the one that sits as far as possible from both classes, leaving the widest empty corridor between them.

Why the widest corridor? Because a new point that lands near the border is the one most likely to be misjudged, and the fattest margin keeps the border as far from every training point as it can. That robustness is what made SVMs the go-to classifier for text and images before deep learning, and a strong baseline on small, high-dimensional data even now. It also leads straight to the two facts GATE leans on every couple of years: the margin equals 2/‖w‖, and only a tiny handful of points actually matter.

The maximum-margin hyperplane

The decision boundary is a hyperplane wᵀx + b = 0. The margin is the empty band around it, running out to the nearest point of each class, and its full width works out to

margin = 2 / ‖w‖

SVM maximises that width. And because the 2 on top is fixed, maximising the margin is the very same problem as minimising ‖w‖ — a small weight norm means a wide corridor. (That answers the last lesson’s puzzle: wide margin and small ‖w‖ are two names for one thing.)

The solid line is the boundary; the dashed lines are the margin edges; only the circled support vectors set their position.

Support vectors — only the closest points matter

The points sitting exactly on the margin edges are the support vectors, and they alone pin down the boundary. Every other point lies safely outside the band and contributes nothing: delete a non-support-vector and the boundary and margin do not move at all. That is the whole reason the method is named for its “support” vectors — the solution rests on a small subset of the data, not all of it. (This also answers the last lesson’s final question: that surprising handful of border points is what decides where the line goes.)

Soft margins and kernels

Two extensions, conceptual only at GATE level:

Soft margin. Real data overlaps, so a hard margin with zero violations may be impossible. The soft-margin SVM adds slack that lets a few points sit inside the margin or on the wrong side, trading a little misclassification for a wider, more robust boundary.
Kernels. A linear SVM can only draw straight boundaries. A kernel (polynomial, RBF) implicitly maps the data into a higher-dimensional space where a straight boundary there corresponds to a curved boundary back in the original space — all without ever computing the mapping explicitly. This “kernel trick” is how SVMs fit non-linear class shapes.

How GATE asks this

Usually a conceptual MCQ/MSQ on the margin and support vectors, sometimes with a tiny hard-margin calculation. Two patterns recur: (1) “given ‖w‖, what is the margin?” — plug into 2/‖w‖; and (2) “what happens to the boundary if a non-support-vector is removed, or a support vector is removed?” Both appeared in 2024 and 2025. No dual formulation or KKT conditions are required.

Worked example — support-vector invariance and the margin

SVM1 is trained on a dataset. SVM2 is trained on the same data but with one point — a non-support-vector — removed. You are also told the learned weight vector has ‖w‖ = 2. Compare the two margins, and give the margin width.

Reason it out in two steps:

Does the margin change? The boundary and margin of an SVM are determined only by the support vectors. The removed point is not a support vector, so it never touched the margin and never constrained the solution. Removing it leaves the support vectors — and therefore w, b, the boundary, and the margin — unchanged. SVM2’s margin equals SVM1’s margin.
What is the width? Plug the given norm into the formula:
```
margin = 2 / ‖w‖ = 2 / 2 = 1.
```

So the margin is unchanged by deleting the non-support-vector, and its width is 1. (GATE DA tested support vectors directly in 2024 — “which set is a possible set of support vectors?” — and the margin again in 2025.) Sanity check: had ‖w‖ = 4 instead, the margin would be 2/4 = 0.5 — a larger weight norm always means a narrower margin.

In one breath

A Support Vector Machine picks, out of all separating hyperplanes wᵀx + b = 0, the maximum-margin one — the widest empty corridor between the classes — and since that corridor’s width is 2/‖w‖, maximising it is identical to minimising ‖w‖; the boundary is pinned only by the support vectors lying on the margin edges (delete any other point and nothing moves), while a soft margin tolerates a little overlap and a kernel implicitly lifts the data into a higher dimension so a straight cut there becomes a curve here.

Practice

Quick check

0/6

Q1Recall — What does 'support vector' refer to in an SVM?

Q2Recall — Which statements about SVMs are correct? (select all that apply)select all that apply

Q3Recall — Which statements about kernels and soft margins are correct? (select all that apply)select all that apply

Q4Trace — A trained SVM has weight vector norm ‖w‖ = 2. What is the width of its margin?numerical answer — type a number

Q5Trace — Another SVM has ‖w‖ = 4. What is its margin width? (2 decimals)numerical answer — type a number

Q6Apply — SVM2 is trained on the same data as SVM1 but with one non-support-vector point removed. How do the two margins compare?

A question to carry forward

Look back over the classifiers so far — logistic regression, LDA, the SVM — and one thing unites them: each hands you a weight vector, a line of opaque arithmetic. To know why a loan was refused you would have to inspect w and do a dot product in your head. The boundary is correct, but it cannot explain itself.

Now imagine a classifier a person could simply read aloud: “Is the applicant’s income below ₹30,000? If yes, is their debt above ₹10,000? Then refuse.” A cascade of plain yes/no questions, each splitting the data a little finer, ending in a verdict you can trace by finger. No weights, no distances, no kernels — and it swallows numeric and categorical features alike. Here is the thread onward: how does such a tree decide which question to ask first, what makes one split “purer” than another, and how do entropy and the Gini index turn “purer” into a number you can compute?

Support Vector Machines

What you'll learn

Before you start

The maximum-margin hyperplane

Support vectors — only the closest points matter

Soft margins and kernels

How GATE asks this

Worked example — support-vector invariance and the margin

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further