Upgrading to Multiple Linear Regression
Moving past single-variable equations and smashing matrix dot products into NumPy to accelerate training.

Hi, I’m Saachi! 👋 I am anundergraduate with a passion for tech. I realized that reading about Machine Learning wasn't enough, so I started Code Train Repeat—a series where I document my journey.
Univariate linear regression is a great sandbox, but real-world datasets never rely on just a single feature. If you want to predict something complex—like house prices—you need to account for multiple independent variables: size, number of bedrooms, floors, and age.
Today, I upgraded my ML-Journey repository to handle Multiple Linear Regression. This forced me to ditch basic scalar math and transition fully into linear algebra, matrix operations, and vectorized execution using NumPy on my M1 Mac.
1. Input Theory: The Multi-Feature Transition
When we introduce multiple features, our notation must expand. Here is the strict terminology framework I am using to track our data structures:
n: The total number of features (independent variables) in our model.x_j: The j-th individual feature (wherejranges from 1 ton).x_vec_i: A row vector containing all feature values for the i-th specific training example, represented mathematically as x^i.x_j_i: The specific value of featurejwithin the $i$-th training example, represented mathematically as xj^i.
Our model function f can no longer simply be wx + b. It must scale to handle every feature with its own dedicated weight:
$$f_{\vec{w},b}(\vec{x}) = w_1x_1 + w_2x_2 + \dots + w_n x_n + b$$
Where:
w_vecis a weight vector of shape(n,)containing parameters \([w_1, w_2, \dots, w_n]\), represented mathematically as \(\vec{w}\).x_vecis a feature vector of shape(n,), represented mathematically as \(\vec{x}\).bremains a single scalar bias parameter.
Expressing the Model as a Dot Product
Writing out every feature explicitly becomes mathematically messy and highly inefficient to compute. By grouping our weights into vector w_vec and our features into vector x_vec, we can compress this entire equation into a single vector dot product:
$$f_{\vec{w},b}(\vec{x}) = \vec{w} \cdot \vec{x} + b$$
2. Process to Code: Ditching For-Loops for Vectorization
The biggest bottleneck in Python is loop execution speed. If we implement our new hypothesis function using a naive for loop, Python has to step through every feature sequentially.
Naive vs. Vectorized Architectures
# Architecture A: Naive For-Loop (Sequential CPU overhead)
def f_naive(w, x, b, n):
f = 0
for i in range(n):
f += w[i] * x[i]
return f + b
# Architecture B: Vectorized Dot Product (Parallel Processing)
import numpy as np
def f_vectorized(w, x, b):
return np.dot(w, x) + b
Vectorization leverages parallel processing hardware. Instead of calculating one multiplication at a time, NumPy taps directly into low-level routines that handle these calculations simultaneously. On an M1 chip, this completely bypasses massive overhead, converting loop steps into highly optimized operations.
Vectorized Gradient Descent Math
The cost function returns a single scalar error value. However, because we now have n separate weights, we must calculate the partial derivative for every single weight w_j from 1 to n.
The update step for every parameter j during an iteration becomes:
$$w_j = w_j - \alpha \frac{1}{m} \sum_{i=1}^{m} (\vec{w} \cdot \vec{x}^{(i)} + b - y^{(i)})x_j^{(i)}$$
The bias parameter b updates exactly as it did before, utilizing the identical error residual term.
Train Model
Vectorized Execution I implemented this multi-feature model inside my local environment. To match the mathematics, the training loop computes the predictions across all m examples simultaneously using matrix-vector multiplication via np.dot.
Here is the functional script added to multiple_linear_regression.py:
import numpy as np
class MultipleLinearRegression:
def __init__(self, learning_rate=0.01, iterations=1000):
self.lr = learning_rate
self.iterations = iterations
self.w = None
self.b = None
def fit(self, X, y):
# m = examples, n = features
m, n = X.shape
# Initialize weights to zero vector, bias to scalar zero
self.w = np.zeros(n)
self.b = 0.0
for _ in range(self.iterations):
# Calculate predictions for all m examples simultaneously
# X is shape (m, n), self.w is shape (n,) -> y_pred is shape (m,)
y_pred = np.dot(X, self.w) + self.b
errors = y_pred - y
# Vectorized gradient calculations
dw = (1 / m) * np.dot(X.T, errors)
db = (1 / m) * np.sum(errors)
# Simultaneous parameter adjustment
self.w -= self.lr * dw
self.b -= self.lr * db
def predict(self, X):
return np.dot(X, self.w) + self.b
if __name__ == "__main__":
# Mock Dataset: 3 features per example (Size, Bedrooms, Age)
X_train = np.array([
[2104, 5, 45],
[1416, 3, 40],
[852, 2, 35]
], dtype=float)
# Ground truth targets (Prices in thousands)
y_train = np.array([460, 232, 178], dtype=float)
# Note: Features are unscaled, so we use a very conservative learning rate
model = MultipleLinearRegression(learning_rate=1e-7, iterations=100)
model.fit(X_train, y_train)
print(f"Trained Weight Vector (w): {model.w}")
print(f"Trained Bias Scalar (b): {model.b:.4f}")
Observation Note: Running this mock data with completely raw, unscaled features requires an incredibly tiny learning rate like
1e-7. Because the house sizes are in the thousands while bedrooms are single digits, our gradients can explode instantly ifalphais even slightly too large.



