close

Data Handling Using Pandas | Class 12 Informatics Practices Notes

July 5, 2025 · By @mritxperts
Data Handling Using Pandas | Class 12 Informatics Practices Notes

📌 Introduction to Pandas

Pandas is a fast, powerful, flexible open-source data analysis and manipulation library built on top of Python. It is designed to work with structured data like tables (Excel, CSV, SQL, etc.).

Key Features:

Importing Pandas

import pandas as pd

Explanation: This imports pandas and assigns it the alias pd, which is a widely used convention.


🧰 Pandas Data Structures

1. Series

A Series is a one-dimensional array with labels (indexes).

Example:

import numpy as np
s = pd.Series([10, 20, np.nan, 40])
print(s)

Output:

0    10.0
1    20.0
2     NaN
3    40.0
dtype: float64

Explanation:


2. DataFrame

A DataFrame is a 2D labeled data structure with columns of potentially different types.

Example:

data = {
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
}
df = pd.DataFrame(data)
print(df)

Output:

    Name  Age
0  Alice   25
1    Bob   30

Explanation:


💪 Basic Operations

Viewing Data

print(df.head())
print(df.tail())

Output:

    Name  Age
0  Alice   25
1    Bob   30

Explanation:

Structure and Summary

print(df.shape)
print(df.columns)
print(df.dtypes)
print(df.info())

Output:

(2, 2)
Index(['Name', 'Age'], dtype='object')
Name    object
Age      int64
dtype: object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    2 non-null      object
 1   Age     2 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 160.0 bytes

Explanation:


🔧 Data Selection

Column Selection

print(df['Name'])

Output:

0    Alice
1      Bob
Name: Name, dtype: object

Explanation: Selects a single column as Series.

Row Selection

print(df.loc[0])
print(df.iloc[1])

Output:

Name    Alice
Age        25
Name: 0, dtype: object

Name     Bob
Age        30
Name: 1, dtype: object

Explanation:


🧹 Data Cleaning

Handling Missing Values

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})
print(df.isnull())
print(df.fillna(0))

Output:

       A      B
0  False  False
1  False   True
2   True  False

     A    B
0  1.0  4.0
1  2.0  0.0
2  0.0  6.0

Explanation:


🔁 Sorting and Filtering

Sorting

print(df.sort_values(by='A'))

Explanation: Sorts the DataFrame based on column ‘A’.

Filtering

print(df[df['A'] > 1])

Explanation: Returns rows where column ‘A’ has values greater than 1.


📈 Aggregation & Statistical Functions in Pandas

Sample DataFrame

import pandas as pd

df = pd.DataFrame({
    'Employee': ['Vikram', 'Alice', 'Bob', 'John', 'Priya'],
    'Salary': [90000, 80000, 60000, 60000, 80000],
    'Experience': [5, 4, 3, 3, 4]
})

1. max() – Maximum value

print(df['Salary'].max())

Output:

90000

Explanation: Returns the highest salary in the column.


2. min() – Minimum value

print(df['Salary'].min())

Output:

60000

Explanation: Returns the lowest salary.


3. count() – Count of non-null values

print(df['Salary'].count())

Output:

5

Explanation: Counts the number of non-missing entries in the ‘Salary’ column.


4. mean() – Average value

print(df['Salary'].mean())

Output:

74000.0

Explanation: Returns the arithmetic mean of all salaries.


5. mode() – Most frequent value(s)

print(df['Salary'].mode())

Output:

0    60000
1 80000
dtype: int64

Explanation: Shows modes; both 60000 and 80000 appear twice.


6. median() – Middle value

print(df['Salary'].median())

Output:

80000.0

Explanation: Middle salary value when sorted.


7. std() – Standard deviation

print(df['Salary'].std())

Output:

13038.40481

Explanation: Measures salary variation from the mean.


8. var() – Variance

print(df['Salary'].var())

Output:

170000000.0

Explanation: Average squared deviation from the mean.


9. corr() – Correlation matrix

print(df.corr(numeric_only=True))

Output:

             Salary  Experience
Salary 1.000000 0.654654
Experience 0.654654 1.000000

Explanation: Measures how strongly salary and experience are related (1 = perfect positive).


10. cov() – Covariance matrix

print(df.cov(numeric_only=True))

Output:

           Salary  Experience
Salary 1.7e+08 65000.0
Experience 65000.0 0.5

Explanation: Shows how salary and experience vary together.


📆 Working with Dates

dates = pd.date_range('2023-01-01', periods=3)
df = pd.DataFrame({'Date': dates, 'Visitors': [100, 200, 300]})
df['Day'] = df['Date'].dt.day_name()
print(df)

Output:

        Date  Visitors      Day
0 2023-01-01       100    Sunday
1 2023-01-02       200    Monday
2 2023-01-03       300   Tuesday

Explanation:


🔀 Merging and Joining

founders = pd.DataFrame({
    'Company': ['Itxperts', 'TechNova'],
    'Founder': ['Vikram Singh Rawat', 'Sara Khan']
})

revenue = pd.DataFrame({
    'Company': ['Itxperts', 'TechNova'],
    'Revenue': [5000000, 3000000]
})

result = pd.merge(founders, revenue, on='Company')
print(result)

Output:

   Company            Founder  Revenue
0  Itxperts  Vikram Singh Rawat  5000000
1  TechNova            Sara Khan  3000000

Explanation: Merges two DataFrames using the common column ‘Company’.


🔃 Pivot Tables

data = pd.DataFrame({
    'Company': ['Itxperts', 'Itxperts', 'TechNova', 'TechNova'],
    'Quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'Profit': [120000, 150000, 100000, 110000]
})

pivot = data.pivot_table(values='Profit', index='Company', columns='Quarter')
print(pivot)

Output:

Quarter      Q1      Q2
Company                
Itxperts  120000  150000
TechNova  100000  110000

Explanation: Creates a pivot table showing profits by quarter for each company.


📊 Data Visualization (Requires matplotlib)

import matplotlib.pyplot as plt

sales = pd.DataFrame({
    'Month': ['Jan', 'Feb', 'Mar'],
    'Itxperts': [30000, 35000, 40000]
})

sales.plot(x='Month', y='Itxperts', kind='bar', title='Itxperts Monthly Sales')
plt.ylabel('Revenue')
plt.show()

Explanation: Plots a bar chart of monthly revenue for Itxperts.


📝 Practice Questions and Answers

Q1. Create a DataFrame for 5 employees of Itxperts with columns: Name, Age, Department, and Salary.

df = pd.DataFrame({
    'Name': ['Vikram Singh Rawat', 'Alice', 'Bob', 'John', 'Priya'],
    'Age': [35, 30, 28, 25, 26],
    'Department': ['Development', 'HR', 'Sales', 'Development', 'HR'],
    'Salary': [90000, 70000, 60000, 75000, 72000]
})
print(df)

Q2. Filter employees who earn more than ₹70,000.

print(df[df['Salary'] > 70000])

Q3. Find the average salary of employees in each department.

print(df.groupby('Department')['Salary'].mean())

Q4. Find the maximum and minimum salaries.

print("Max Salary:", df['Salary'].max())
print("Min Salary:", df['Salary'].min())

Q5. Find the most common (mode) salary.

print(df['Salary'].mode())

Q6. Get a statistical summary of all numeric columns.

print(df.describe())

Q7. Add a new column “Experience” and calculate correlation between Salary and Experience.

df['Experience'] = [10, 7, 5, 6, 8]
print(df[['Salary', 'Experience']].corr())

Q8. Sort the DataFrame by Salary in descending order.

print(df.sort_values(by='Salary', ascending=False))

Q9. Replace department “HR” with “Human Resources”.

df['Department'] = df['Department'].replace('HR', 'Human Resources')
print(df)

Q10. Save the DataFrame to a CSV file named “itxperts_employees.csv” without index.

df.to_csv("itxperts_employees.csv", index=False)

🎯 MCQs on Pandas

Q1. Which of the following is NOT a core data structure in pandas?
A. Series
B. DataFrame
C. Array
D. Panel
Answer: C. Array


Q2. What function is used to read a CSV file in pandas?
A. read_table()
B. read_file()
C. read_csv()
D. open_csv()
Answer: C. read_csv()


Q3. Which function returns the number of non-null values in a DataFrame column?
A. sum()
B. count()
C. len()
D. value_counts()
Answer: B. count()


Q4. What does the describe() function do?
A. Shows null values
B. Sorts the data
C. Provides summary statistics
D. Removes duplicates
Answer: C. Provides summary statistics


Q5. Which of the following is used to calculate correlation between two columns?
A. .cov()
B. .corr()
C. .mean()
D. .std()
Answer: B. .corr()


🧪 Mini Project: Itxperts Performance Analysis

Step 1: Create DataFrame

df = pd.DataFrame({
    'Name': ['Vikram', 'Alice', 'Bob', 'John', 'Priya'],
    'Department': ['AI/ML', 'Web', 'Sales', 'Web', 'HR'],
    'Score': [92, 88, 75, 80, 70],
    'Experience': [10, 7, 5, 6, 4]
})

Step 2: Find department-wise average score

print(df.groupby('Department')['Score'].mean())

Step 3: Add a column for performance rating

def get_rating(score):
    if score >= 90:
        return 'Excellent'
    elif score >= 80:
        return 'Good'
    elif score >= 70:
        return 'Average'
    else:
        return 'Needs Improvement'

df['Rating'] = df['Score'].apply(get_rating)
print(df)

Step 4: Export top performers (score > 85)

top_performers = df[df['Score'] > 85]
top_performers.to_csv('top_performers.csv', index=False)

Important Pandas Questions for CBSE Class 12 Board Exams

50 Important Questions on Pandas (Python – Data Handling using Pandas) that can be asked in CBSE Class 12 Informatics Practices / Computer Science Board Exams – covering 1-mark, 2-mark, and 3/4-mark types:


1-Mark Questions (Definition / MCQ Type)

  1. What is Pandas in Python?
  2. What are the two main data structures of Pandas?
  3. What is a Series in Pandas?
  4. What is a DataFrame?
  5. Which method is used to read a CSV file in Pandas?
  6. Which method returns the first 5 rows of a DataFrame?
  7. Which attribute returns the number of rows and columns in a DataFrame?
  8. Which function gives the statistical summary of a DataFrame?
  9. What is the default indexing in a Pandas Series?
  10. What does df.dtypes return?
  11. Which function is used to write a DataFrame into a CSV file?
  12. What is the difference between loc[] and iloc[]?
  13. Which function is used to find null values in a DataFrame?
  14. Which Pandas function is used to find the correlation between columns?
  15. What is the output type of df['ColumnName']?

2-Mark Questions (Short Answer)

  1. Write two differences between Series and DataFrame.
  2. How is missing data represented in Pandas?
  3. How can you rename a column in a DataFrame?
  4. What is the use of dropna() and fillna() in Pandas?
  5. Write a statement to sort a DataFrame by the column “Age” in descending order.
  6. How can you filter rows where salary is more than ₹50,000?
  7. Write a code to create a Series from a dictionary.
  8. Write a Python statement to calculate the mean of column “Marks”.
  9. Write a code to read an Excel file using Pandas.
  10. What is the purpose of the groupby() function?

3/4-Mark Questions (Application-Based / Coding)

  1. Write a Python program to create a DataFrame with columns: Name, Age, and Marks.
  2. Write code to:
  1. Write a code to drop the column “Email” from the DataFrame df.
  2. Given a DataFrame df, write a code to count total null values in each column.
  3. Create a DataFrame of 3 students and add a new column “Result” based on Marks.
  4. Write a code to sort a DataFrame first by “Class” and then by “Marks”.
  5. Create a Series of your 5 favorite fruits and print the first and last two elements.
  6. How can you perform aggregation to find total salary department-wise?
  7. Write a code to merge two DataFrames df1 and df2 based on “EmpID”.
  8. How is value_counts() useful in analyzing data?

Case-Based / Real Application Questions

  1. Create a DataFrame of ITxperts employees with name, department, and experience. Display those with more than 5 years of experience.
  2. From a CSV file “sales.csv”, read the data and show maximum, minimum, and average monthly sales.
  3. Write a code to replace all NaN values in the “Marks” column with 0.
  4. Explain the use of apply() function with an example.
  5. Write a Python function to classify students as “Pass” or “Fail” based on Marks using apply().

Output Prediction / Error Finding

  1. Predict the output of:
import pandas as pd  
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])  
print(s['b'])
  1. Predict the output of:
df = pd.DataFrame({'X': [1, 2], 'Y': [3, 4]})  
print(df.loc[1])
  1. Find the error:
df = pd.DataFrame()  
print(df.head(10))  
df.sort('Salary')
  1. What will be the output if:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})  
print(df.mean())
  1. What will df.describe(include='all') return?

Conceptual / Reasoning-Based

  1. Why is Pandas preferred for data analysis in Python?
  2. What happens if the index values in a Series are not unique?
  3. Differentiate between append() and concat() in Pandas.
  4. Can a DataFrame have columns of different data types? Justify.
  5. How is Pandas useful for handling real-world tabular data?