Introduction to Pandas: DataFrames Made Easy
Introduction to Pandas: DataFrames Made Easy
When working with data in Python, one of the most powerful and widely used libraries is Pandas. It’s designed to make data manipulation and analysis fast and easy, especially when dealing with structured data.
In this blog, we’ll explore the basics of Pandas—understanding its core data structures and learning how to manipulate data efficiently.
What is Pandas?
Pandas is an open-source data analysis and manipulation tool built on top of the Python programming language. It offers two primary data structures:
- Series: A one-dimensional labeled array.
- DataFrame: A two-dimensional labeled data structure (like a spreadsheet or SQL table).
Why Use Pandas?
- Simplifies data cleaning and preparation
- Offers intuitive methods for filtering and aggregating data
- Seamless integration with other libraries like NumPy, Matplotlib, and Scikit-learn
- Makes CSV/Excel/JSON reading and writing very easy
Installing Pandas
If you haven’t already, install Pandas using pip:
pip install pandas
The Series: A One-Dimensional Data Structure
import pandas as pd
data = pd.Series([10, 20, 30, 40])
print(data)
You can also label each item:
data = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(data['b']) # Output: 20
The DataFrame: Two-Dimensional Table
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Score': [85, 90, 95]
}
df = pd.DataFrame(data)
print(df)
Reading Data from CSV
df = pd.read_csv('data.csv')
print(df.head()) # Show the first 5 rows
Selecting and Filtering Data
- Select a column:
df['Score']
- Filter rows:
df[df['Score'] > 90]
- Select specific rows and columns:
df.loc[0, 'Name'] # by label
df.iloc[0, 1] # by position
Common DataFrame Operations
- Add a new column:
df['Passed'] = df['Score'] > 80
- Sort values:
df.sort_values('Score', ascending=False)
- Describe the data:
df.describe()
- Check for nulls:
df.isnull().sum()
Exporting Data
You can save the cleaned or modified DataFrame to a new CSV file:
df.to_csv('cleaned_data.csv', index=False)
Summary
Pandas makes data manipulation intuitive and efficient. It’s the go-to library for handling data in Python and forms the foundation for most machine learning workflows.