How to Work with CSV and JSON Datasets in Python

Published on August 3, 2025 by @mritxperts

Introduction

When working with Machine Learning or data analysis projects, CSV and JSON are the most commonly used file formats for datasets. Python makes it easy to load, parse, and manipulate these formats using built-in libraries and external packages like pandas and json. In this post, you’ll learn how to work with CSV and JSON data efficiently.


What is a CSV File?

CSV (Comma-Separated Values) is a simple text file format used to store tabular data like spreadsheets. Each line represents a row, and commas separate the columns.

Example:

Name,Age,Gender
Alice,30,Female
Bob,25,Male

What is a JSON File?

JSON (JavaScript Object Notation) is a format used to store and transport structured data using key-value pairs. It’s commonly used in APIs and web data.

Example:

{
  "Name": "Alice",
  "Age": 30,
  "Gender": "Female"
}

Reading CSV Files Using Pandas

import pandas as pd

# Load CSV file
df = pd.read_csv('data.csv')

# Display first 5 rows
print(df.head())

You can also specify:

  • delimiter: if the separator is not a comma.
  • usecols: to load specific columns.
  • na_values: to handle missing values.

Writing to a CSV File

df.to_csv('output.csv', index=False)

Setting index=False ensures the index column is not saved in the file.


Reading JSON Files

Python provides a built-in json module.

import json

# Load JSON file
with open('data.json', 'r') as file:
    data = json.load(file)

print(data)

For a JSON dataset structured as a list of dictionaries, you can load it into a DataFrame:

df = pd.DataFrame(data)
print(df.head())

Writing JSON Files

with open('output.json', 'w') as file:
    json.dump(data, file, indent=4)

The indent=4 makes the output more readable.


Using Pandas to Read JSON

df = pd.read_json('data.json')
print(df.head())

You can also convert a DataFrame to JSON:

df.to_json('output.json', orient='records', lines=True)

Key Differences Between CSV and JSON

FeatureCSVJSON
StructureTabularHierarchical
ReadabilityHuman-readableHuman-readable
Best forTables/SpreadsheetsNested or complex data
SizeSmallerSlightly larger

Conclusion

Understanding how to handle CSV and JSON datasets is essential for any data-driven project. Python’s pandas and json modules make it simple to read, write, and manipulate data stored in these formats. Mastering this skill will help you handle real-world data more effectively in your Machine Learning journey.