Data Handling Using Pandas | Class 12 Informatics Practices Notes - Itxperts

📌 Introduction to Pandas

Pandas is a fast, powerful, flexible open-source data analysis and manipulation library built on top of Python. It is designed to work with structured data like tables (Excel, CSV, SQL, etc.).

Key Features:

Easy handling of missing data
Automatic and explicit data alignment
Powerful group-by functionality
Time series functionality

Importing Pandas

import pandas as pd

Explanation: This imports pandas and assigns it the alias pd, which is a widely used convention.

🧰 Pandas Data Structures

1. Series

A Series is a one-dimensional array with labels (indexes).

Example:

import numpy as np
s = pd.Series([10, 20, np.nan, 40])
print(s)

Output:

0    10.0
1    20.0
2     NaN
3    40.0
dtype: float64

Explanation:

NaN indicates a missing value.
Index is auto-generated (0,1,2,…).

2. DataFrame

A DataFrame is a 2D labeled data structure with columns of potentially different types.

Example:

data = {
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
}
df = pd.DataFrame(data)
print(df)

Output:

    Name  Age
0  Alice   25
1    Bob   30

Explanation:

Dictionary keys become column names.
Values are inserted row-wise.

💪 Basic Operations

Viewing Data

print(df.head())
print(df.tail())

Output:

    Name  Age
0  Alice   25
1    Bob   30

Explanation:

.head() shows the first 5 rows.
.tail() shows the last 5 rows.

Structure and Summary

print(df.shape)
print(df.columns)
print(df.dtypes)
print(df.info())

Output:

(2, 2)
Index(['Name', 'Age'], dtype='object')
Name    object
Age      int64
dtype: object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    2 non-null      object
 1   Age     2 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 160.0 bytes

Explanation:

.shape returns dimensions.
.columns lists column names.
.dtypes shows data types.
.info() summarizes memory usage and null values.

🔧 Data Selection

Column Selection

print(df['Name'])

Output:

0    Alice
1      Bob
Name: Name, dtype: object

Explanation: Selects a single column as Series.

Row Selection

print(df.loc[0])
print(df.iloc[1])

Output:

Name    Alice
Age        25
Name: 0, dtype: object

Name     Bob
Age        30
Name: 1, dtype: object

Explanation:

.loc[] uses label-based indexing.
.iloc[] uses integer position.

🧹 Data Cleaning

Handling Missing Values

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})
print(df.isnull())
print(df.fillna(0))

Output:

       A      B
0  False  False
1  False   True
2   True  False

     A    B
0  1.0  4.0
1  2.0  0.0
2  0.0  6.0

Explanation:

isnull() shows where values are missing.
fillna(0) replaces all missing values with 0.

🔁 Sorting and Filtering

Sorting

print(df.sort_values(by='A'))

Explanation: Sorts the DataFrame based on column ‘A’.

Filtering

print(df[df['A'] > 1])

Explanation: Returns rows where column ‘A’ has values greater than 1.

📈 Aggregation & Statistical Functions in Pandas

Sample DataFrame

import pandas as pd

df = pd.DataFrame({
    'Employee': ['Vikram', 'Alice', 'Bob', 'John', 'Priya'],
    'Salary': [90000, 80000, 60000, 60000, 80000],
    'Experience': [5, 4, 3, 3, 4]
})

1. `max()` – Maximum value

print(df['Salary'].max())

Output:

Explanation: Returns the highest salary in the column.

2. `min()` – Minimum value

print(df['Salary'].min())

Output:

Explanation: Returns the lowest salary.

3. `count()` – Count of non-null values

print(df['Salary'].count())

Output:

Explanation: Counts the number of non-missing entries in the ‘Salary’ column.

4. `mean()` – Average value

print(df['Salary'].mean())

Output:

74000.0

Explanation: Returns the arithmetic mean of all salaries.

5. `mode()` – Most frequent value(s)

print(df['Salary'].mode())

Output:

0    60000
1    80000
dtype: int64

Explanation: Shows modes; both 60000 and 80000 appear twice.

6. `median()` – Middle value

print(df['Salary'].median())

Output:

80000.0

Explanation: Middle salary value when sorted.

7. `std()` – Standard deviation

print(df['Salary'].std())

Output:

13038.40481

Explanation: Measures salary variation from the mean.

8. `var()` – Variance

print(df['Salary'].var())

Output:

170000000.0

Explanation: Average squared deviation from the mean.

9. `corr()` – Correlation matrix

print(df.corr(numeric_only=True))

Output:

             Salary  Experience
Salary     1.000000    0.654654
Experience 0.654654    1.000000

Explanation: Measures how strongly salary and experience are related (1 = perfect positive).

10. `cov()` – Covariance matrix

print(df.cov(numeric_only=True))

Output:

           Salary  Experience
Salary     1.7e+08   65000.0
Experience 65000.0   0.5

Explanation: Shows how salary and experience vary together.

📆 Working with Dates

dates = pd.date_range('2023-01-01', periods=3)
df = pd.DataFrame({'Date': dates, 'Visitors': [100, 200, 300]})
df['Day'] = df['Date'].dt.day_name()
print(df)

Output:

        Date  Visitors      Day
0 2023-01-01       100    Sunday
1 2023-01-02       200    Monday
2 2023-01-03       300   Tuesday

Explanation:

Generates a range of dates and adds a column for day names.

🔀 Merging and Joining

founders = pd.DataFrame({
    'Company': ['Itxperts', 'TechNova'],
    'Founder': ['Vikram Singh Rawat', 'Sara Khan']
})

revenue = pd.DataFrame({
    'Company': ['Itxperts', 'TechNova'],
    'Revenue': [5000000, 3000000]
})

result = pd.merge(founders, revenue, on='Company')
print(result)

Output:

   Company            Founder  Revenue
0  Itxperts  Vikram Singh Rawat  5000000
1  TechNova            Sara Khan  3000000

Explanation: Merges two DataFrames using the common column ‘Company’.

🔃 Pivot Tables

data = pd.DataFrame({
    'Company': ['Itxperts', 'Itxperts', 'TechNova', 'TechNova'],
    'Quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'Profit': [120000, 150000, 100000, 110000]
})

pivot = data.pivot_table(values='Profit', index='Company', columns='Quarter')
print(pivot)

Output:

Quarter      Q1      Q2
Company                
Itxperts  120000  150000
TechNova  100000  110000

Explanation: Creates a pivot table showing profits by quarter for each company.

📊 Data Visualization (Requires matplotlib)

import matplotlib.pyplot as plt

sales = pd.DataFrame({
    'Month': ['Jan', 'Feb', 'Mar'],
    'Itxperts': [30000, 35000, 40000]
})

sales.plot(x='Month', y='Itxperts', kind='bar', title='Itxperts Monthly Sales')
plt.ylabel('Revenue')
plt.show()

Explanation: Plots a bar chart of monthly revenue for Itxperts.

📝 Practice Questions and Answers

Q1. Create a DataFrame for 5 employees of Itxperts with columns: Name, Age, Department, and Salary.

df = pd.DataFrame({
    'Name': ['Vikram Singh Rawat', 'Alice', 'Bob', 'John', 'Priya'],
    'Age': [35, 30, 28, 25, 26],
    'Department': ['Development', 'HR', 'Sales', 'Development', 'HR'],
    'Salary': [90000, 70000, 60000, 75000, 72000]
})
print(df)

Q2. Filter employees who earn more than ₹70,000.

print(df[df['Salary'] > 70000])

Q3. Find the average salary of employees in each department.

print(df.groupby('Department')['Salary'].mean())

Q4. Find the maximum and minimum salaries.

print("Max Salary:", df['Salary'].max())
print("Min Salary:", df['Salary'].min())

Q5. Find the most common (mode) salary.

print(df['Salary'].mode())

Q6. Get a statistical summary of all numeric columns.

print(df.describe())

Q7. Add a new column “Experience” and calculate correlation between Salary and Experience.

df['Experience'] = [10, 7, 5, 6, 8]
print(df[['Salary', 'Experience']].corr())

Q8. Sort the DataFrame by Salary in descending order.

print(df.sort_values(by='Salary', ascending=False))

Q9. Replace department “HR” with “Human Resources”.

df['Department'] = df['Department'].replace('HR', 'Human Resources')
print(df)

Q10. Save the DataFrame to a CSV file named “itxperts_employees.csv” without index.

df.to_csv("itxperts_employees.csv", index=False)

🎯 MCQs on Pandas

Q1. Which of the following is NOT a core data structure in pandas?
A. Series
B. DataFrame
C. Array
D. Panel
✅ Answer: C. Array

Q2. What function is used to read a CSV file in pandas?
A. read_table()
B. read_file()
C. read_csv()
D. open_csv()
✅ Answer: C. read_csv()

Q3. Which function returns the number of non-null values in a DataFrame column?
A. sum()
B. count()
C. len()
D. value_counts()
✅ Answer: B. count()

Q4. What does the describe() function do?
A. Shows null values
B. Sorts the data
C. Provides summary statistics
D. Removes duplicates
✅ Answer: C. Provides summary statistics

Q5. Which of the following is used to calculate correlation between two columns?
A. .cov()
B. .corr()
C. .mean()
D. .std()
✅ Answer: B. .corr()

🧪 Mini Project: Itxperts Performance Analysis

Step 1: Create DataFrame

df = pd.DataFrame({
    'Name': ['Vikram', 'Alice', 'Bob', 'John', 'Priya'],
    'Department': ['AI/ML', 'Web', 'Sales', 'Web', 'HR'],
    'Score': [92, 88, 75, 80, 70],
    'Experience': [10, 7, 5, 6, 4]
})

Step 2: Find department-wise average score

print(df.groupby('Department')['Score'].mean())

Step 3: Add a column for performance rating

def get_rating(score):
    if score >= 90:
        return 'Excellent'
    elif score >= 80:
        return 'Good'
    elif score >= 70:
        return 'Average'
    else:
        return 'Needs Improvement'

df['Rating'] = df['Score'].apply(get_rating)
print(df)

Step 4: Export top performers (score > 85)

top_performers = df[df['Score'] > 85]
top_performers.to_csv('top_performers.csv', index=False)

Important Pandas Questions for CBSE Class 12 Board Exams

50 Important Questions on Pandas (Python – Data Handling using Pandas) that can be asked in CBSE Class 12 Informatics Practices / Computer Science Board Exams – covering 1-mark, 2-mark, and 3/4-mark types:

✅ 1-Mark Questions (Definition / MCQ Type)

What is Pandas in Python?
What are the two main data structures of Pandas?
What is a Series in Pandas?
What is a DataFrame?
Which method is used to read a CSV file in Pandas?
Which method returns the first 5 rows of a DataFrame?
Which attribute returns the number of rows and columns in a DataFrame?
Which function gives the statistical summary of a DataFrame?
What is the default indexing in a Pandas Series?
What does df.dtypes return?
Which function is used to write a DataFrame into a CSV file?
What is the difference between loc[] and iloc[]?
Which function is used to find null values in a DataFrame?
Which Pandas function is used to find the correlation between columns?
What is the output type of df['ColumnName']?

✅ 2-Mark Questions (Short Answer)

Write two differences between Series and DataFrame.
How is missing data represented in Pandas?
How can you rename a column in a DataFrame?
What is the use of dropna() and fillna() in Pandas?
Write a statement to sort a DataFrame by the column “Age” in descending order.
How can you filter rows where salary is more than ₹50,000?
Write a code to create a Series from a dictionary.
Write a Python statement to calculate the mean of column “Marks”.
Write a code to read an Excel file using Pandas.
What is the purpose of the groupby() function?

✅ 3/4-Mark Questions (Application-Based / Coding)

Write a Python program to create a DataFrame with columns: Name, Age, and Marks.
Write code to:

Read a CSV file named students.csv
Display first 3 rows

Write a code to drop the column “Email” from the DataFrame df.
Given a DataFrame df, write a code to count total null values in each column.
Create a DataFrame of 3 students and add a new column “Result” based on Marks.
Write a code to sort a DataFrame first by “Class” and then by “Marks”.
Create a Series of your 5 favorite fruits and print the first and last two elements.
How can you perform aggregation to find total salary department-wise?
Write a code to merge two DataFrames df1 and df2 based on “EmpID”.
How is value_counts() useful in analyzing data?

✅ Case-Based / Real Application Questions

Create a DataFrame of ITxperts employees with name, department, and experience. Display those with more than 5 years of experience.
From a CSV file “sales.csv”, read the data and show maximum, minimum, and average monthly sales.
Write a code to replace all NaN values in the “Marks” column with 0.
Explain the use of apply() function with an example.
Write a Python function to classify students as “Pass” or “Fail” based on Marks using apply().

✅ Output Prediction / Error Finding

Predict the output of:

import pandas as pd  
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])  
print(s['b'])

Predict the output of:

df = pd.DataFrame({'X': [1, 2], 'Y': [3, 4]})  
print(df.loc[1])

Find the error:

df = pd.DataFrame()  
print(df.head(10))  
df.sort('Salary')

What will be the output if:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})  
print(df.mean())

What will df.describe(include='all') return?

✅ Conceptual / Reasoning-Based

Why is Pandas preferred for data analysis in Python?
What happens if the index values in a Series are not unique?
Differentiate between append() and concat() in Pandas.
Can a DataFrame have columns of different data types? Justify.
How is Pandas useful for handling real-world tabular data?

Data Handling Using Pandas | Class 12 Informatics Practices Notes

📌 Introduction to Pandas

Importing Pandas

🧰 Pandas Data Structures

1. Series

Example:

2. DataFrame

Example:

💪 Basic Operations

Viewing Data

Structure and Summary

🔧 Data Selection

Column Selection

Row Selection

🧹 Data Cleaning

Handling Missing Values

🔁 Sorting and Filtering

Sorting

Filtering

📈 Aggregation & Statistical Functions in Pandas

Sample DataFrame

1. max() – Maximum value

2. min() – Minimum value

3. count() – Count of non-null values

4. mean() – Average value

5. mode() – Most frequent value(s)

6. median() – Middle value

7. std() – Standard deviation

8. var() – Variance

9. corr() – Correlation matrix

10. cov() – Covariance matrix

📆 Working with Dates

🔀 Merging and Joining

🔃 Pivot Tables

📊 Data Visualization (Requires matplotlib)

📝 Practice Questions and Answers

Q1. Create a DataFrame for 5 employees of Itxperts with columns: Name, Age, Department, and Salary.

Q2. Filter employees who earn more than ₹70,000.

Q3. Find the average salary of employees in each department.

Q4. Find the maximum and minimum salaries.

Q5. Find the most common (mode) salary.

Q6. Get a statistical summary of all numeric columns.

Q7. Add a new column “Experience” and calculate correlation between Salary and Experience.

Q8. Sort the DataFrame by Salary in descending order.

Q9. Replace department “HR” with “Human Resources”.

Q10. Save the DataFrame to a CSV file named “itxperts_employees.csv” without index.

🎯 MCQs on Pandas

🧪 Mini Project: Itxperts Performance Analysis

Step 1: Create DataFrame

Step 2: Find department-wise average score

Step 3: Add a column for performance rating

Step 4: Export top performers (score > 85)

Important Pandas Questions for CBSE Class 12 Board Exams

✅ 1-Mark Questions (Definition / MCQ Type)

✅ 2-Mark Questions (Short Answer)

✅ 3/4-Mark Questions (Application-Based / Coding)

✅ Case-Based / Real Application Questions

✅ Output Prediction / Error Finding

✅ Conceptual / Reasoning-Based

Share this Post

You Might Also Like

Python Data Types

Python Lists

50 Core Python Questions and Answers for CBSE Class 11th & 12th Students | By ITXperts

1. `max()` – Maximum value

2. `min()` – Minimum value

3. `count()` – Count of non-null values

4. `mean()` – Average value

5. `mode()` – Most frequent value(s)

6. `median()` – Middle value

7. `std()` – Standard deviation

8. `var()` – Variance

9. `corr()` – Correlation matrix

10. `cov()` – Covariance matrix