datarekha

Gaussian mixture models

Soft clustering — every point gets a probability of belonging to each cluster, not a hard label. How GMMs model data as a mix of Gaussians, fit with EM, and generalize k-means.

7 min read Advanced Machine Learning Lesson 30 of 33

What you'll learn

  • GMMs as a probabilistic mixture of Gaussian clusters
  • Soft assignment vs k-means' hard assignment
  • The EM algorithm intuition and when GMMs beat k-means

Before you start

K-means forces every point into exactly one cluster and assumes those clusters are round, equal-sized blobs. Gaussian mixture models (GMMs) relax both: they model the data as a blend of Gaussian “bells,” give each point a probability of belonging to each one, and allow elongated, differently shaped clusters. GMM is the probabilistic generalization of k-means.

Soft assignment — the key difference

K-means says “point 7 is in cluster B.” A GMM says “point 7 is 70% cluster B, 30% cluster A.” For points deep inside a cluster that distinction barely matters — but for points in the overlap between clusters, the soft probability is far more honest than a coin-flip hard label.

k-means: hard labeloverlap point → 100% one sideGMM: soft probabilityoverlap point → 60% / 40%
k-means draws a hard line and tilts every point to one side; a GMM fits shaped ellipses and reports membership probabilities.

GMMs also fit covariance per cluster — so clusters can be stretched, rotated ellipses, not just circles. That alone lets them fit data k-means mangles.

Fitting with EM

You can’t directly compute the clusters because you don’t know which point came from which Gaussian. The Expectation-Maximization (EM) algorithm solves this by alternating, much like k-means does:

  1. E-step — given the current Gaussians, compute each point’s probability of belonging to each (soft assignment).
  2. M-step — given those probabilities, update each Gaussian’s mean, covariance, and weight.

Repeat until it converges. In fact, k-means is a special case of EM with hard assignments and fixed spherical covariance — GMM is the fuller, probabilistic version.

Quick check

Quick check

0/3
Q1What's the key difference between a GMM and k-means?
Q2What do the E-step and M-step of EM do?
Q3How do you choose the number of components in a GMM?

Next

That completes the clustering family. To visualize any of these clusters in 2D, project with t-SNE & UMAP.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
How does a Gaussian Mixture Model differ from k-means, and when would you prefer it?

A GMM models data as a mixture of Gaussian distributions and assigns soft probabilities of cluster membership, fitting clusters that can be elliptical and different sizes via the EM algorithm. K-means does hard assignment to the nearest centroid and implicitly assumes spherical, equal-size clusters. Prefer a GMM when clusters overlap, have different shapes or covariances, or when you need probabilistic (soft) assignments.

Explain the EM algorithm in the context of fitting a Gaussian Mixture Model.

EM fits a GMM by alternating two steps: the E-step computes each point's responsibility (posterior probability) under each Gaussian using current parameters, and the M-step updates the means, covariances, and mixing weights to maximize the expected log-likelihood given those responsibilities. It iterates until the likelihood converges. Because the objective is non-convex, EM only reaches a local optimum, so initialization and multiple restarts matter.

What are the main limitations of k-means clustering?

K-means requires specifying k upfront, assumes clusters are convex and roughly equal in size and density, is sensitive to outliers and feature scale, and can converge to local minima. It struggles with non-globular shapes such as rings or crescents, and it assigns every point to exactly one cluster with no notion of uncertainty.

How does k-means clustering work?

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

Related lessons

Explore further

Skip to content