Explain the EM algorithm in the context of fitting a Gaussian Mixture Model.

EM fits a GMM by alternating two steps: the E-step computes each point's responsibility (posterior probability) under each Gaussian using current parameters, and the M-step updates the means, covariances, and mixing weights to maximize the expected log-likelihood given those responsibilities. It iterates until the likelihood converges. Because the objective is non-convex, EM only reaches a local optimum, so initialization and multiple restarts matter.

How does a Gaussian Mixture Model differ from k-means, and when would you prefer it?

A GMM models data as a mixture of Gaussian distributions and assigns soft probabilities of cluster membership, fitting clusters that can be elliptical and different sizes via the EM algorithm. K-means does hard assignment to the nearest centroid and implicitly assumes spherical, equal-size clusters. Prefer a GMM when clusters overlap, have different shapes or covariances, or when you need probabilistic (soft) assignments.

What are the main limitations of k-means clustering?

K-means requires specifying k upfront, assumes clusters are convex and roughly equal in size and density, is sensitive to outliers and feature scale, and can converge to local minima. It struggles with non-globular shapes such as rings or crescents, and it assigns every point to exactly one cluster with no notion of uncertainty.

How does k-means clustering work?

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

Gaussian mixture models — Machine Learning

K-means forces every point into exactly one cluster and assumes those clusters are round, equal-sized blobs. Gaussian mixture models (GMMs) relax both: they model the data as a blend of Gaussian “bells,” give each point a probability of belonging to each one, and allow elongated, differently shaped clusters. GMM is the probabilistic generalization of k-means.

Soft assignment — the key difference

K-means says “point 7 is in cluster B.” A GMM says “point 7 is 70% cluster B, 30% cluster A.” For points deep inside a cluster that distinction barely matters — but for points in the overlap between clusters, the soft probability is far more honest than a coin-flip hard label.

k-means draws a hard line and tilts every point to one side; a GMM fits shaped ellipses and reports membership probabilities.

GMMs also fit covariance per cluster — so clusters can be stretched, rotated ellipses, not just circles. That alone lets them fit data k-means mangles.

Fitting with EM

You can’t directly compute the clusters because you don’t know which point came from which Gaussian. The Expectation-Maximization (EM) algorithm solves this by alternating, much like k-means does:

E-step — given the current Gaussians, compute each point’s probability of belonging to each (soft assignment).
M-step — given those probabilities, update each Gaussian’s mean, covariance, and weight.

Repeat until it converges. In fact, k-means is a special case of EM with hard assignments and fixed spherical covariance — GMM is the fuller, probabilistic version.

import numpy as np
from sklearn.mixture import GaussianMixture
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# Stretched, overlapping clusters that k-means struggles with.
X, _ = make_blobs(n_samples=500, centers=2, cluster_std=[1.0, 2.8], random_state=2)

gmm = GaussianMixture(n_components=2, covariance_type="full", random_state=0).fit(X)
proba = gmm.predict_proba(X)            # soft: probability per cluster

# Find the most uncertain point (closest to a 50/50 split)
uncertain = np.argmin(np.abs(proba[:, 0] - 0.5))
print(f"Most ambiguous point: {proba[uncertain].round(2)} (cluster probabilities)")
print(f"GMM can express '{proba[uncertain][0]:.0%} A, {proba[uncertain][1]:.0%} B' —")
print("k-means would force it fully into one cluster.")
print(f"\nModel selection: BIC = {gmm.bic(X):.0f} (try a few n_components, pick the lowest BIC)")

In one breath

A GMM models data as a blend of Gaussian “bells” and gives each point a probability of belonging to each — soft assignment, where k-means forces a hard label.
It also fits a covariance per cluster, so clusters can be stretched, rotated ellipses, not just circles.
It’s fit by EM: the E-step computes soft memberships, the M-step updates each Gaussian’s mean, covariance, and weight, repeated to convergence.
k-means is the special case of EM with hard assignments and fixed spherical covariance.
Choose the number of components with BIC/AIC (lower is better), use several restarts to dodge local optima, and switch to DBSCAN for non-Gaussian shapes.

Quick check

0/3

Q1What's the key difference between a GMM and k-means?

Q2What do the E-step and M-step of EM do?

Q3How do you choose the number of components in a GMM?

That completes the clustering family. To visualize any of these clusters in 2D, project with t-SNE & UMAP.

Gaussian mixture models

What you'll learn

Before you start

Soft assignment — the key difference

Fitting with EM

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further