datarekha

Simple Linear Regression

Fit the best straight line through points by minimizing squared vertical errors — the least-squares solution you compute by hand. A 2025 NAT, step by step.

8 min read Intermediate GATE DA Lesson 78 of 122

What you'll learn

  • The line model y = mx + c, and the through-origin model y = wx
  • The squared-error cost Σ(yᵢ − ŷᵢ)² and why we minimize it
  • The least-squares normal equations, and the closed form w = Σ(xᵢyᵢ)/Σ(xᵢ²) for a line through the origin
  • Solving a real GATE DA 2025 NAT: fit y = wx to three points

Before you start

Give a child a scatter of dots and a ruler, and they will lay the ruler where it “looks right” — close to as many dots as possible. Linear regression is that instinct made precise: find the single straight line whose total error against the points is as small as possible. The only thing we must pin down is what “smallest error” means. It is also the model every data team reaches for first — the honest baseline a fancier model has to beat before anyone trusts it.

The model and the cost

A line has the model y = mx + c, where m is the slope and c the intercept. Sometimes we force the line through the origin, y = wx, with a single weight w and no intercept. For each data point the line predicts ŷᵢ = m·xᵢ + c, and the gap between truth and prediction, yᵢ − ŷᵢ, is the residual.

Least squares chooses the line that minimizes the sum of squared residuals:

cost=i( yᵢ − ŷᵢ )²each residual, squared
Square every vertical gap, add them up, and pick the line that makes the total smallest.

Squaring does two jobs: it makes every error positive (so gaps above and below the line cannot cancel out), and it punishes big misses far more than small ones. In the picture below, each shaded square is one residual squared — literally a square whose side is the gap. The least-squares line is the one that makes the total shaded area as small as possible. Drag the endpoints to try to beat it, then hit Solve to snap to the optimum.

The normal equations

Setting the derivative of the cost with respect to each parameter to zero gives the normal equations — the conditions the best line must satisfy. For the through-origin model y = wx, there is just one parameter, and the solution is beautifully compact:

w=∑ xᵢ yᵢ∑ xᵢ²least-squares slope through the origin
Through the origin, the optimal weight is the cross-term sum over the squared-x sum.

For the full line y = mx + c the normal equations give two formulas, but GATE’s 2025 question used the through-origin form above, so that is the one to have at your fingertips.

How GATE asks this

The bread-and-butter form is a NAT: you are handed 3 to 5 points and asked for the slope (or a prediction) of the least-squares fit. GATE DA 2025 asked exactly this — fit y = wx to three points and report w. The recipe never changes: form Σ xᵢ yᵢ and Σ xᵢ², then divide. The occasional MCQ probes the concept — what least squares minimizes, or what happens with an outlier.

Worked example — a real GATE DA 2025 NAT

Fit the model y = wx (through the origin) to the points (−1, 1), (2, −5), and (3, 5) by least squares. Find w. (Real GATE DA 2025 question.)

Build the two sums column by column, then divide:

point        x·y                 x²
(−1,  1)    (−1)(1)  = −1        (−1)² = 1
( 2, −5)    (2)(−5)  = −10        (2)² = 4
( 3,  5)    (3)(5)   =  15        (3)² = 9
            ─────────────        ──────────
   Σ x·y  = −1 − 10 + 15 = 4     Σ x² = 1 + 4 + 9 = 14

        Σ x·y       4
  w  =  ───────  =  ──  =  0.2857…  ≈  0.286
         Σ x²       14

So w ≈ 0.286. Notice the fit does not pass through any single point — it balances all three so the squared vertical gaps are collectively smallest.

Quick check

Quick check

0/5
Q1Fit y = wx (through the origin) by least squares to the points (1, 3), (2, 4), (3, 8). Find w. (2 decimals)numerical answer — type a number
Q2For the real 2025 points (−1,1), (2,−5), (3,5), the least-squares through-origin slope is w = 4/14. What is w to 3 decimals?numerical answer — type a number
Q3Which statements about ordinary least-squares linear regression are TRUE? (select all that apply)select all that apply
Q4A data point currently has a residual of −4 (the line over-predicts by 4). How much does this single point contribute to the squared-error cost?numerical answer — type a number
Q5In the model y = mx + c, what does the intercept c represent?

Practice this in an interview

All questions
How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

What are the core assumptions of linear regression, and what breaks when each is violated?

OLS linear regression rests on five assumptions: linearity, independence of errors, homoscedasticity, normality of residuals, and no perfect multicollinearity. Violating any one of them degrades coefficient estimates, standard errors, or the validity of hypothesis tests.

When should you use gradient descent over the normal equation to fit a linear regression?

The normal equation gives an exact closed-form solution in O(p³) time but becomes impractical when the number of features p is large (typically above ~10,000) because matrix inversion is cubic. Gradient descent scales as O(np) per iteration, making it the only viable option for large feature spaces or online learning.

Why is linear regression unsuitable for binary classification, and what specific problems does logistic regression fix?

Linear regression predicts unbounded real values, so it can output probabilities below 0 or above 1, and its loss function penalizes confident correct predictions. Logistic regression fixes this by applying the sigmoid to map any real score to (0,1) and optimizing log-loss, which is a proper scoring rule aligned with probability calibration.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content