A. Introduction to Statistical Data
Statistics is a branch of mathematics that helps us collect, organize, analyze, and interpret data.
In Artificial Intelligence, statistical data is used to:
- Understand data patterns
- Make predictions
- Train and test models
Before building any AI model, we need to study the data using basic statistics.
B. Types of Data
1. Qualitative Data (Categorical)
- Data that describes qualities or categories.
- Cannot be measured numerically.
- Examples: Gender, color, type of food, etc.
2. Quantitative Data (Numerical)
- Data that can be measured or counted.
- Can be divided into:
- Discrete Data: Countable (e.g., number of students)
- Continuous Data: Measurable (e.g., height, weight, temperature)
C. Measures of Central Tendency
These are values that represent the center or average of a dataset.
1. Mean (Average)
- Sum of all data values divided by the number of values.
Formula:
Mean=Sum of all valuesNumber of values\text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}Mean=Number of valuesSum of all values
Example:
Marks = 40, 50, 60
Mean = (40 + 50 + 60) / 3 = 150 / 3 = 50
2. Median
- The middle value when data is arranged in order.
- If even number of values, median is the average of the two middle values.
Example (Odd):
Marks = 20, 30, 40 → Median = 30
Example (Even):
Marks = 20, 30, 40, 50 → Median = (30 + 40)/2 = 35
3. Mode
- The value that appears most frequently.
Example:
Marks = 20, 30, 30, 40 → Mode = 30
D. Measures of Dispersion (Spread of Data)
These measures tell us how spread out or varied the data is.
1. Range
- The difference between the maximum and minimum values.
Formula:
Range=Maximum value−Minimum value\text{Range} = \text{Maximum value} – \text{Minimum value}Range=Maximum value−Minimum value
Example:
Marks = 20, 30, 40
Range = 40 – 20 = 20
2. Standard Deviation (for higher understanding, optional for basics)
- Shows how much the values differ from the mean.
- A low standard deviation means the data is close to the mean.
- A high standard deviation means the data is spread out.
E. Data Visualization Techniques
Data can be better understood using charts and graphs.
1. Bar Graph
- Used to represent categorical data.
- Each bar shows the frequency of a category.
2. Histogram
- Used to represent numerical data (like marks or height).
- Shows how data is distributed over intervals.
3. Pie Chart
- Circular chart divided into slices.
- Each slice shows the proportion of a category.
4. Line Graph
- Shows trends over time (e.g., temperature by days).
F. Using Python Libraries to Analyze Data (Overview Only)
- NumPy and Pandas libraries in Python help us perform statistical calculations.
- Matplotlib and Seaborn are used for drawing graphs.
Example using Python (For practical understanding):
pythonCopyEditimport numpy as np
marks = [40, 50, 60]
mean = np.mean(marks)
median = np.median(marks)
print("Mean:", mean)
print("Median:", median)
G. Real-Life Examples
Scenario | Statistical Method Used |
---|---|
Checking average marks in a class | Mean |
Finding the most common favorite fruit | Mode |
Analyzing the difference in heights | Range / Standard Deviation |
Showing number of boys vs. girls | Bar Graph |
H. Activity Suggestion for Students
Ask students to collect data of their classmates such as:
- Marks in a test
- Height in cm
- Favorite color
Then:
- Calculate Mean, Median, and Mode
- Draw a bar graph or pie chart
Discuss:
- Which measure gives the best idea of the data?
- Are there any outliers (values that are too high or low)?
I. Keywords to Remember
Term | Meaning |
---|---|
Mean | The average value of data |
Median | The middle value in ordered data |
Mode | The most frequently occurring value |
Range | The difference between the highest and lowest values |
Qualitative Data | Data in categories or labels |
Quantitative Data | Data in numbers or measurable quantities |
Bar Graph | Chart to represent categories |
Histogram | Chart to show frequency distribution |
J. Summary of the Unit
- Statistics helps us understand data through averages and visualizations.
- We learned how to calculate Mean, Median, Mode, and Range.
- We understood how to use charts and graphs to visualize data.
- These tools are essential for exploring data before building an AI model.