Statistical Data

Published on June 29, 2025 by @mritxperts

A. Introduction to Statistical Data

Statistics is a branch of mathematics that helps us collect, organize, analyze, and interpret data.

In Artificial Intelligence, statistical data is used to:

  • Understand data patterns
  • Make predictions
  • Train and test models

Before building any AI model, we need to study the data using basic statistics.


B. Types of Data

1. Qualitative Data (Categorical)

  • Data that describes qualities or categories.
  • Cannot be measured numerically.
  • Examples: Gender, color, type of food, etc.

2. Quantitative Data (Numerical)

  • Data that can be measured or counted.
  • Can be divided into:
    • Discrete Data: Countable (e.g., number of students)
    • Continuous Data: Measurable (e.g., height, weight, temperature)

C. Measures of Central Tendency

These are values that represent the center or average of a dataset.

1. Mean (Average)

  • Sum of all data values divided by the number of values.

Formula:
Mean=Sum of all valuesNumber of values\text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}Mean=Number of valuesSum of all values​

Example:
Marks = 40, 50, 60
Mean = (40 + 50 + 60) / 3 = 150 / 3 = 50


2. Median

  • The middle value when data is arranged in order.
  • If even number of values, median is the average of the two middle values.

Example (Odd):
Marks = 20, 30, 40 → Median = 30

Example (Even):
Marks = 20, 30, 40, 50 → Median = (30 + 40)/2 = 35


3. Mode

  • The value that appears most frequently.

Example:
Marks = 20, 30, 30, 40 → Mode = 30


D. Measures of Dispersion (Spread of Data)

These measures tell us how spread out or varied the data is.

1. Range

  • The difference between the maximum and minimum values.

Formula:
Range=Maximum value−Minimum value\text{Range} = \text{Maximum value} – \text{Minimum value}Range=Maximum value−Minimum value

Example:
Marks = 20, 30, 40
Range = 40 – 20 = 20


2. Standard Deviation (for higher understanding, optional for basics)

  • Shows how much the values differ from the mean.
  • A low standard deviation means the data is close to the mean.
  • A high standard deviation means the data is spread out.

E. Data Visualization Techniques

Data can be better understood using charts and graphs.

1. Bar Graph

  • Used to represent categorical data.
  • Each bar shows the frequency of a category.

2. Histogram

  • Used to represent numerical data (like marks or height).
  • Shows how data is distributed over intervals.

3. Pie Chart

  • Circular chart divided into slices.
  • Each slice shows the proportion of a category.

4. Line Graph

  • Shows trends over time (e.g., temperature by days).

F. Using Python Libraries to Analyze Data (Overview Only)

  • NumPy and Pandas libraries in Python help us perform statistical calculations.
  • Matplotlib and Seaborn are used for drawing graphs.

Example using Python (For practical understanding):

pythonCopyEditimport numpy as np

marks = [40, 50, 60]
mean = np.mean(marks)
median = np.median(marks)
print("Mean:", mean)
print("Median:", median)

G. Real-Life Examples

ScenarioStatistical Method Used
Checking average marks in a classMean
Finding the most common favorite fruitMode
Analyzing the difference in heightsRange / Standard Deviation
Showing number of boys vs. girlsBar Graph

H. Activity Suggestion for Students

Ask students to collect data of their classmates such as:

  • Marks in a test
  • Height in cm
  • Favorite color

Then:

  • Calculate Mean, Median, and Mode
  • Draw a bar graph or pie chart

Discuss:

  • Which measure gives the best idea of the data?
  • Are there any outliers (values that are too high or low)?

I. Keywords to Remember

TermMeaning
MeanThe average value of data
MedianThe middle value in ordered data
ModeThe most frequently occurring value
RangeThe difference between the highest and lowest values
Qualitative DataData in categories or labels
Quantitative DataData in numbers or measurable quantities
Bar GraphChart to represent categories
HistogramChart to show frequency distribution

J. Summary of the Unit

  • Statistics helps us understand data through averages and visualizations.
  • We learned how to calculate Mean, Median, Mode, and Range.
  • We understood how to use charts and graphs to visualize data.
  • These tools are essential for exploring data before building an AI model.