Skip to main content

Command Palette

Search for a command to run...

What Actually IS Machine Learning?

Goals, Types, and the Lifecycle of a Model.

Published
5 min read
What Actually IS Machine Learning?
S

Hi, I’m Saachi! 👋 I am anundergraduate with a passion for tech. I realized that reading about Machine Learning wasn't enough, so I started Code Train Repeat—a series where I document my journey.

The Input: Breaking the Paradigm

For years, I programmed by writing rules.

  • If user clicks button -> Then open window.

  • If temperature > 100 -> Then boil water.

This is Traditional Programming: I give the computer the Data and the Rules, and it gives me the Answer.

Machine Learning flips this upside down. I give the computer the Data and the Answer, and it figures out the Rules.

  • Traditional: Input + Rules = Output

  • Machine Learning: Input + Output = Rules

1. The Three Types of Learning

Machine Learning isn't one thing. It generally falls into three buckets.

A. Supervised Learning (The Teacher)

This is 90% of business ML. The data comes with an "Answer Key" (Labels).

  • Analogy: A teacher shows a child flashcards. "This is a cat." "This is a dog." Eventually, the child learns to identify them on their own.

  • The Goal: Predict the label for new data.

  • Examples: Predicting House Prices (Regression), Spam Detection (Classification).

B. Unsupervised Learning (The Explorer)

The data has no labels. The machine is left alone to find patterns.

  • Analogy: Giving a child a bucket of mixed LEGOs. They naturally sort them by color or size, even if you never told them what "Red" or "Small" means.

  • The Goal: Discover hidden structures.

  • Examples: Customer Segmentation (Clustering), Recommendation Systems ("People who bought X also bought Y").

C. Reinforcement Learning (The Gamer)

The model learns by trial and error.

  • Analogy: Training a dog. Good behavior = Treat (Reward). Bad behavior = No Treat (Penalty).

  • Examples: Teaching a robot to walk, or an AI to play Chess.

2. The Vocabulary: Speaking the Language

To work in ML, you have to stop calling them "columns" and "rows."

TermDefinitionSimple Example
Features (X)The input data used to make predictions.Square Footage, Bedrooms, Zip Code.
Target / Label (y)The answer we want to predict.House Price.
TrainingThe process of the model "learning" the rules.Finding the math formula that turns X into y.
ModelThe mathematical engine (The Artifact).The formula itself (Price = SqFt x 200).
InferenceUsing the trained model on new data.Estimating the price of a house that just went on the market.

2. The Machine Learning Cycle

You don't just "train" a model once and walk away. It is a lifecycle.

A. The Golden Rule: Splitting Data

If I train my model on all my data, it will memorize the answers (Overfitting). To test if it actually learned, I need to hide some data.

The 3-Way Split:

  1. Training Set (70%): The Study Guide. The model learns from this.

  2. Validation Set (15%): The Practice Exam. We use this to tune settings (Hyperparameters) while training.

  3. Test Set (15%): The Final Exam. We lock this away in a vault and only touch it once at the very end to see how the model performs in the real world.

B. The Robust Way: K-Fold Cross-Validation

Splitting data once is risky. What if my "Test Set" just happened to be really easy?

K-Fold Cross-Validation fixes this.

  1. Split the data into K equal parts (e.g., 5 folds).

  2. Train on 4 parts, Test on 1 part.

  3. Repeat 5 times, rotating the test part.

  4. Average the scores.

Analogy: Instead of taking one final exam, the student takes 5 mini-exams covering different chapters. The average score is a better measure of their knowledge.

3. The Two Main Problems: Regression vs. Classification

In Supervised Learning (where we spend most of our time), there are only two questions we usually ask:

1. "How Much?" (Regression)

  • The answer is a Quantity (Continuous Number).

  • Examples: "What will the temperature be tomorrow?", "What is the price of Bitcoin?"

2. "Which One?" (Classification)

  • The answer is a Category (Label).

  • Examples: "Is this email Spam?", "Is this tumor Benign or Malignant?", "Is this image a Cat, Dog, or Bird?"

4. The Danger Zone: Overfitting

This is the most important concept for a beginner.

Imagine a student studying for a history test.

  • Student A memorizes the exact dates and wording of the practice questions. (He gets 100% on the practice test).

  • Student B studies the concepts and causes of the events. (She gets 90% on the practice test).

On the real exam, the questions are rephrased.

  • Student A fails. He didn't learn history; he memorized the practice sheet. This is Overfitting.

  • Student B passes. She learned the general pattern, so she can handle new data.

The Golden Rule: We don't want a model that memorizes the training data. We want a model that generalizes to new data.

5. Evaluation Metrics: Keeping Score

How do we know if the model is "good"? It depends on the problem.

For Regression (Predicting Numbers)

  • MAE (Mean Absolute Error): "On average, my prediction is off by 500." (Easy to understand).

  • MSE (Mean Squared Error): Punishes large errors heavily. (Good for math).

  • RMSE (Root Mean Squared Error): The standard metric. It’s in the same units as your target (e.g., "Dollars").

For Classification (Predicting Categories)

  • Accuracy: % of correct answers. (Trap: If 99% of patients are healthy, a model that says "Healthy" for everyone is 99% accurate but useless).

  • Precision: "Of all the people I said had cancer, how many actually did?" (Quality of positive predictions).

  • Recall: "Of all the people who actually had cancer, how many did I find?" (Quantity of positive predictions).

  • F1-Score: The balance between Precision and Recall.

Machine Learning

Part 8 of 22

Documenting my journey to master Machine Learning. The focus is on consistent practice: writing code, training models, and daily improvement. Expect detailed project notes and a transparent look at what it takes to learn ML from scratch.

Up next

Scikit-Learn & Linear Regression

Why you don't need Deep Learning for 90% of your problems.