Evaluating Models

Published on June 29, 2025 by @mritxperts

A. Introduction to Model Evaluation

After we create an AI model, we need to check if it is performing well and giving correct results. This process is called Model Evaluation.

Without evaluation, we cannot trust the predictions made by the model.


B. Why is Evaluation Important?

  • To check how accurate the model is.
  • To identify errors or wrong predictions.
  • To avoid using a model that gives biased or unfair results.
  • To improve the model by finding where it performs poorly.

C. Key Terms in Model Evaluation

TermMeaning
PredictionThe output given by the AI model based on input data
Actual ValueThe real or correct answer from the dataset
True Positive (TP)Model predicted Yes, and it was Yes
True Negative (TN)Model predicted No, and it was No
False Positive (FP)Model predicted Yes, but it was No (wrongly positive)
False Negative (FN)Model predicted No, but it was Yes (wrongly negative)

D. Confusion Matrix

A confusion matrix is a table used to describe the performance of a model on a set of test data.

Structure of Confusion Matrix

Predicted: YesPredicted: No
Actual: YesTrue Positive (TP)False Negative (FN)
Actual: NoFalse Positive (FP)True Negative (TN)

This table helps us to calculate different performance metrics.


E. Accuracy

Accuracy tells us the percentage of predictions that are correct.

Formula:
Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}Accuracy=TP+TN+FP+FNTP+TN​

Example:
If TP = 50, TN = 30, FP = 10, FN = 10, then:
Accuracy=50+3050+30+10+10=80100=80%\text{Accuracy} = \frac{50 + 30}{50 + 30 + 10 + 10} = \frac{80}{100} = 80\%Accuracy=50+30+10+1050+30​=10080​=80%


F. Precision

Precision tells us how many of the predicted “Yes” results were actually correct.

Formula:
Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}Precision=TP+FPTP​

Example:
If TP = 50, FP = 10, then:
Precision=5050+10=5060≈83.3%\text{Precision} = \frac{50}{50 + 10} = \frac{50}{60} \approx 83.3\%Precision=50+1050​=6050​≈83.3%


G. Recall (Sensitivity or True Positive Rate)

Recall tells us how many of the actual “Yes” cases were correctly predicted by the model.

Formula:
Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}Recall=TP+FNTP​

Example:
If TP = 50, FN = 10, then:
Recall=5050+10=5060≈83.3%\text{Recall} = \frac{50}{50 + 10} = \frac{50}{60} \approx 83.3\%Recall=50+1050​=6050​≈83.3%


H. F1 Score

The F1 Score combines both Precision and Recall into a single value using the harmonic mean.

Formula:
F1=2×Precision×RecallPrecision+RecallF1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}F1=2×Precision+RecallPrecision×Recall​

F1 Score is useful when we want to balance between Precision and Recall.


I. Example Case Study – Spam Email Classifier

EmailActualPredicted
Email 1SpamSpam
Email 2Not SpamSpam
Email 3SpamNot Spam
Email 4Not SpamNot Spam

From this data:

  • TP = 1 (Spam predicted as Spam)
  • FP = 1 (Not Spam predicted as Spam)
  • FN = 1 (Spam predicted as Not Spam)
  • TN = 1 (Not Spam predicted as Not Spam)

Now, calculate:

  • Accuracy = (TP + TN) / Total = (1 + 1) / 4 = 50%
  • Precision = TP / (TP + FP) = 1 / (1 + 1) = 50%
  • Recall = TP / (TP + FN) = 1 / (1 + 1) = 50%

This shows the model is not very reliable.


J. When to Use Which Metric

MetricUse When
AccuracyData is balanced (equal Yes/No outcomes)
PrecisionYou want to avoid false positives (e.g., fraud detection)
RecallYou want to avoid false negatives (e.g., disease detection)
F1 ScoreYou want a balance between precision and recall

K. Common Mistakes in Model Evaluation

  1. Only checking accuracy (not enough in real problems).
  2. Ignoring false positives and false negatives.
  3. Not checking for bias or fairness in predictions.
  4. Using small or unbalanced test data.

L. Activity Suggestion

Give students a mini dataset of predicted and actual values. Ask them to:

  • Build a confusion matrix.
  • Calculate Accuracy, Precision, and Recall.
  • Interpret what the values tell us about the model.

M. Keywords to Remember

TermDescription
Confusion MatrixTable showing TP, TN, FP, FN
AccuracyPercentage of correct predictions
PrecisionOut of all predicted “Yes,” how many were actually “Yes”
RecallOut of all actual “Yes,” how many were predicted correctly
F1 ScoreA single score combining precision and recall
TP, TN, FP, FNCounts of different correct and incorrect predictions

N. Summary of the Unit

  • Evaluation is a key step to check how well an AI model performs.
  • A confusion matrix helps organize the outcomes.
  • Important metrics include Accuracy, Precision, Recall, and F1 Score.
  • The choice of metric depends on the type of problem you are solving.