What Is Linear Regression (in ML terms)?

Linear regression is a supervised machine learning algorithm that tries to model the relationship between input variables (features) and an output variable (label) by fitting a straight line to the data.

General Form Equation (Simple Linear Regression) For one input feature π‘₯, the model is: 𝑦 = 𝑀π‘₯ + 𝑏

Let’s say you want to predict someone’s weight based on their height:

  • Feature: height (in cm)
  • Label: weight (in kg)

Where:

  • 𝑦: is the label (target/output variable) β€” in your example, weight
  • π‘₯: is the feature (input variable) β€” in your example, height
  • 𝑀: is the weight (slope) β€” how much 𝑦 changes when π‘₯ increases
  • 𝑏: is the bias (intercept) β€” the value of 𝑦 when π‘₯ = 0

Find the best values of 𝑀 and 𝑏 such that the predicted values Ε· are as close as possible to the actual values 𝑦.

Linear regression form: weight = 𝑀 β‹… height + 𝑏

Matches: 𝑦 = 𝑀π‘₯ + 𝑏

So, when doing machine learning:

  • π‘₯ can represent any input (feature), like height, age, number of hours studied, etc.
  • 𝑦 is the prediction the model makes.
  • 𝑀 and 𝑏 are what the model learns from training data.

Step 1: Sample Data

PersonHeight (cm)Weight (kg)
A16050
B16555
C17060
D17565
E18070

Here:

  • height is the feature (input).
  • weight is the label (output).

Step 2: Goal

Find the best-fit line: \(\text{weight} = w \cdot \text{height} + b\)

Step 3: Use the Formulas

To calculate 𝑀 and 𝑏, we use the least squares method:

\(w = \frac{n\sum(x_i \cdot y_i) - \sum x_i \sum y_i}{n\sum x_i^2 - (\sum x_i)^2}\)

\(b = \frac{\sum y_i - w \sum x_i}{n}\)

Where:

  • \(x_i\) = height of the i-th person
  • \(y_i\) = weight of the i-th person
  • \(n\) = number of data points
  • \(\Sigma\) = summation (e.g., \(\Sigma x_i = x_1 + x_2 + ... + x_n\))

Step 4: Plug in the Values Let’s calculate the required sums:

Height \((x)\)Weight \((y)\)\(x \cdot y\)\(x^2\)
16555907527225
170601020028900
175651137530625
180701260032400
\(\Sigma\)83030051250
  • \(n = 5\)
  • \(\sum x_i = 850\)
  • \(\sum y_i = 300\)
  • \(\sum x_i y_i = 51250\)
  • \(\sum x_i^2 = 144750\)

Calculate slope \(w\):

\(w = \frac{5 \cdot 51250 - 850 \cdot 300}{5 \cdot 144750 - 850^2} = \frac{256250 - 255000}{723750 - 722500} = \frac{1250}{1250} = 1\)

Calculate intercept \(b\):

\(b = \frac{300 - 1 \cdot 850}{5} = \frac{-550}{5} = -110\)

Final Linear Equation

\(\boxed{\text{weight} = 1 \cdot \text{height} - 110}\)

This means: for a given height (in cm), you can estimate the weight (in kg) with this formula.

Example Use

If someone’s height is 172 cm:

\(\text{weight} = 1 \cdot 172 - 110 = 62 \text{ kg}\)