datarekha

LU Decomposition

Factor A = L·U into a lower- and an upper-triangular matrix — it is Gaussian elimination remembered, and it makes solving Ax = b for many right-hand sides cheap.

6 min read Intermediate GATE DA Lesson 30 of 122

What you'll learn

  • LU factorisation: A = L·U with L lower-triangular (unit diagonal) and U upper-triangular
  • U is the row-echelon form; L stores the elimination multipliers
  • Why LU is used: solve Ax = b cheaply for many different b via forward- then back-substitution
  • Pivoting (PA = LU) exists for numerical stability and when a zero pivot appears

Before you start

Here’s the small annoyance behind LU: Gaussian elimination does a pile of work to triangulate a matrix, and then we throw all of it away. LU decomposition is just elimination that keeps the receipt. You split A into two triangular pieces, A = L · U. The U is the upper-triangular echelon form you already get from elimination. The L is a lower-triangular matrix (with 1s on its diagonal) that records the multipliers you used along the way.

The win is reuse. Once L and U are saved, solving Ax = b for many different right-hand sides becomes cheap — the heavy triangulation happens once, and each new solve is two quick substitutions. This is exactly why a general linear solver like scipy.linalg.solve factors the matrix once with LU under the hood rather than re-running elimination per right-hand side.

The factorisation

Afull=Llower, unit diagonal×Uupper (echelon form)
L holds the elimination multipliers below its unit diagonal; U is the triangular result of elimination.

Concretely: when you clear an entry by the operation R2 -> R2 − m·R1, the number m is a multiplier. U is what A becomes after all such operations; L is the identity matrix with each multiplier m slotted into the position it cleared. That is the entire construction — no extra computation beyond the elimination you already do.

To then solve Ax = b, write it as L(Ux) = b and split into two triangular solves:

  • Forward-substitution on Ly = b (top row down) recovers y.
  • Back-substitution on Ux = y (bottom row up) recovers x.

Each substitution costs about O(n^2), while the factorisation costs O(n^3) and is done only once. So for k different right-hand sides the total is one O(n^3) factorisation plus k cheap O(n^2) solves — far better than re-eliminating k times. That reuse is the whole reason LU exists.

How GATE asks this

Expect either an MCQ on the idea (“what does L store?”, “why use LU instead of re-running elimination?”) or a small NAT that hands you a 2-by-2 or 3-by-3 matrix and asks for one entry of L or U after factorisation. The mechanics are the whole game: run elimination, read U off the result, and read each L entry off the multiplier that cleared that position. Pivoting rarely appears beyond a one-line “it exists for stability” fact.

Worked example

Factor A = [[2, 1], [4, 3]].

A = [ 2  1 ]
    [ 4  3 ]

eliminate the (2,1) entry:  multiplier m = 4 / 2 = 2
R2 -> R2 - 2·R1:

U = [ 2  1 ]        L = [ 1  0 ]     (slot m = 2 into position (2,1),
    [ 0  1 ]            [ 2  1 ]      keep 1s on the diagonal)

The single multiplier was 4/2 = 2, so L has a 2 under its diagonal and U is the echelon form. Verify L · U = A:

[ 1  0 ] [ 2  1 ]   [ 1·2 + 0·0   1·1 + 0·1 ]   [ 2  1 ]
[ 2  1 ] [ 0  1 ] = [ 2·2 + 1·0   2·1 + 1·1 ] = [ 4  3 ]  = A  ✓

So L = [[1, 0], [2, 1]] and U = [[2, 1], [0, 1]].

Quick check

Quick check

0/5
Q1Factor A = [[3, 6], [1, 5]] as L·U with unit diagonal on L. Enter the (2,1) entry of L (the multiplier).numerical answer — type a number
Q2For A = [[2, 1], [4, 3]] factored as L·U with L = [[1, 0], [2, 1]], what is the (2,2) entry of U?numerical answer — type a number
Q3Which statements about A = L·U are correct? (select all that apply)select all that apply
Q4Why prefer LU decomposition over re-running Gaussian elimination from scratch each time, when solving Ax = b for many different b with the same A?
Q5A matrix has a 0 in its first pivot position. Which single technique lets you still factor it?

Practice this in an interview

All questions
What is GELU and why does it outperform ReLU in transformer models?

GELU (Gaussian Error Linear Unit) multiplies the input by the probability that a standard Gaussian random variable is smaller than it, producing a smooth, non-monotonic curve that approximates ReLU but with a stochastic regularization flavor. Transformers favor GELU because the smooth gradient near zero improves optimization in deep attention-based architectures.

What techniques reduce LLM cost and latency in production?

Cost scales with input plus output tokens; latency scales with output tokens and model size. The highest-leverage levers are: model routing (use a small model when the task is simple), prompt caching (reuse expensive prefix computation), output length control, and batching. Together these can cut spend 60–90% without quality regression.

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

What is the fundamental difference between L1 (Lasso) and L2 (Ridge) regularization, and when do you choose each?

L1 adds the sum of absolute coefficient values to the loss, which drives some coefficients to exactly zero and performs implicit feature selection. L2 adds the sum of squared coefficients, which shrinks all weights proportionally but rarely zeroes any out. Lasso is preferred when you suspect only a few features matter; Ridge is preferred when most features contribute small effects.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content