What is the kernel trick in SVM, and why does it work?

The kernel trick lets an SVM find a nonlinear decision boundary by implicitly mapping data into a higher-dimensional space where it becomes linearly separable, without ever computing that mapping explicitly. It works because the SVM's dual formulation depends only on dot products between points, and a kernel function computes that dot product directly in the high-dimensional space. Common kernels are linear, polynomial, and RBF.

What do the C and gamma hyperparameters control in an SVM, and how do they relate to overfitting?

C controls the soft-margin tradeoff: large C penalizes misclassifications heavily, producing a narrow margin that can overfit, while small C allows more slack for better generalization. Gamma (for RBF kernels) sets how far one training point's influence reaches: high gamma makes a wiggly boundary that overfits, low gamma makes it smoother. You tune both jointly via cross-validation after scaling features.

How does an SVM work, and what is the kernel trick?

An SVM finds the hyperplane that maximises the margin between the two nearest points of each class (the support vectors). When data is not linearly separable, the kernel trick implicitly maps inputs to a high-dimensional feature space — computing inner products there without ever materialising the transformation — enabling non-linear decision boundaries at the cost of linear-space computation.

What does the C parameter control in a Support Vector Machine?

C is the regularisation parameter that trades margin width against training error tolerance. A small C allows many margin violations (wide margin, simpler boundary, higher bias) while a large C penalises violations heavily, forcing a narrow margin that fits the training data more tightly but risks overfitting.

Support vector machines — Machine Learning

Many lines can separate two classes. A support vector machine (SVM) asks a sharper question: which boundary leaves the widest gap between the classes? That single idea — the maximum margin — gives SVMs strong generalization, and the kernel trick that comes with it is one of the most elegant moves in all of ML (and a perennial interview question).

Max margin: the widest street

Picture the boundary as a street between the two classes. The SVM makes that street as wide as possible. The points that touch the curb — the closest ones on each side — are the support vectors, and they’re the only points that define the boundary. Move a far-away point and nothing changes; move a support vector and the whole boundary shifts. That focus on the hardest cases is why SVMs generalize well.

The SVM picks the boundary with the widest margin. Only the support vectors — the points on the margin — determine it; move a far point and nothing changes.

Soft margins and C

Real data isn’t cleanly separable, so SVMs use a soft margin — they allow some points to sit inside the margin or on the wrong side, for a penalty. The C hyperparameter sets how much you punish those violations:

Low C → a wider, more tolerant margin (more regularization). Accepts some misclassifications for a smoother boundary. Higher bias, lower variance.
High C → a narrow margin that tries hard to classify every training point correctly. Lower bias, higher variance — risks overfitting.

C is the SVM’s bias–variance dial: low C widens the street and tolerates errors; high C narrows it to fit every training point.

The kernel trick — bending the boundary

A straight line can’t separate concentric circles. The SVM’s superpower is the kernel trick: it implicitly maps the data into a higher-dimensional space where a straight boundary does exist, without ever computing those coordinates — it only needs dot products, which the kernel computes directly.

Linear kernel — a straight boundary. Fast; the right choice for high-dimensional data (text) that’s already roughly linearly separable.
RBF (Gaussian) kernel — the default for non-linear data; wraps curved, blobby boundaries. Tuned by gamma (how local each point’s influence is).
Polynomial — curved boundaries of a fixed degree.

import numpy as np
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_circles

# Concentric circles — NOT linearly separable.
X, y = make_circles(n_samples=500, noise=0.08, factor=0.4, random_state=0)

for kernel in ["linear", "rbf"]:
    # SVMs need scaled features (they compare distances).
    clf = make_pipeline(StandardScaler(), SVC(kernel=kernel, C=1.0))
    acc = cross_val_score(clf, X, y, cv=5).mean()
    print(f"{kernel:>6} kernel: {acc:.3f} CV accuracy")

print("\nLinear can't wrap a ring (~50%); RBF separates it (~99%).")

In one breath

An SVM picks the boundary with the widest margin; only the support vectors (the closest points, on the margin) define it — move a far point and nothing changes.
Soft margins allow some violations, and C is the bias-variance dial: low C = wider, tolerant margin (more bias); high C = narrow margin that fits hard (more variance, overfit risk).
The kernel trick maps data into a higher-dimensional space where a linear boundary exists, using only dot products — linear (high-D/text), RBF (curved, tune gamma), polynomial.
RBF is not a free upgrade — match the kernel to the data (start linear); RBF’s two knobs (C and gamma) overfit easily.
Always scale features (SVMs compare distances). Best on high-dimensional, smaller datasets; gradient-boosted trees usually win on large tabular data.

Quick check

0/3

Q1What are support vectors, and why do they matter?

Q2What does the C hyperparameter control?

Q3What is the kernel trick?

The last classification topic before the tree ensembles: every real problem is lopsided, so class imbalance — and the fixes that stop a model from just predicting the majority.

Support vector machines

What you'll learn

Before you start

Max margin: the widest street

Soft margins and C

The kernel trick — bending the boundary

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further