Writing to CSV with Pandas in Python

Reading data into Pandas dataframe

Pandas is a popular library for data analysis in Python. It provides easy-to-use and highly optimized tools to manipulate and analyze data. In this section, we will explore how to read data into a Pandas dataframe.

Reading CSV files

Reading CSV files is one of the most common operations in Pandas. Suppose we have a CSV file “data.csv” that looks like this:

name,age,gender
Alice,25,Female
Bob,30,Male
Charlie,35,Male

To read this file, we can use the read_csv() function from Pandas:

import pandas as pd

df = pd.read_csv('data.csv')

This will create a Pandas dataframe with the contents of the CSV file. By default, the first row of the CSV file is assumed to be the header row, and is used as the column names of the resulting dataframe.

Reading Excel files

Reading Excel files is also supported by Pandas. Suppose we have an Excel file “data.xlsx” that looks like this:

name	age	gender
Alice	25	Female
Bob	30	Male
Charlie	35	Male

To read this file, we can use the read_excel() function from Pandas:

import pandas as pd

df = pd.read_excel('data.xlsx')

This will create a Pandas dataframe with the contents of the Excel file. By default, the first row of the Excel file is assumed to be the header row, and is used as the column names of the resulting dataframe. We can also specify the sheet name if the Excel file contains multiple sheets:

import pandas as pd

df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

Reading JSON files

Reading JSON files is another operation that is supported by Pandas. Suppose we have a JSON file “data.json” that looks like this:

[
	{
		"name": "Alice",
		"age": 25,
		"gender": "Female"
	},
	{
		"name": "Bob",
		"age": 30,
		"gender": "Male"
	},
	{
		"name": "Charlie",
		"age": 35,
		"gender": "Male"
	}
]

To read this file, we can use the read_json() function from Pandas:

import pandas as pd

df = pd.read_json('data.json')

This will create a Pandas dataframe with the contents of the JSON file. The keys in the JSON objects will be used as the column names of the resulting dataframe.

Exporting Pandas dataframe to CSV

Exporting Pandas dataframe to CSV is a common operation in data analysis. A CSV file is a simple text file that contains values separated by commas. In this section, we will explore how to export data from a Pandas dataframe to a CSV file.

import pandas as pd

# create a dataframe
data = {'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'gender': ['Female', 'Male', 'Male']}
df = pd.DataFrame(data)

# export the dataframe to a CSV file
df.to_csv('data.csv', index=False)

Here, we created a Pandas dataframe with three columns - name, age, and gender. We then used the to_csv() function to export the dataframe to a CSV file.

The index parameter is set to False to avoid writing the index column to the CSV file. We can also specify a different delimiter character using the sep parameter. For example, to use a tab character as a delimiter, we can do:

df.to_csv('data.tsv', sep='\t', index=False)

This will create a tab-separated values (TSV) file instead of a CSV file.

We can also append data to an existing CSV file using the mode parameter. For example, to append the dataframe to an existing file “data.csv”, we can do:

df.to_csv('data.csv', mode='a', header=False, index=False)

Here, the mode parameter is set to 'a' (append), and the header parameter is set to False to avoid writing the header row again.

Customizing CSV export options

Pandas provides several options to customize the way data is exported to a CSV file. These options can be used to control the formatting of the output, the handling of missing values, and more.

Formatting the output

The to_csv() function can take several parameters to customize the output format of the CSV file. For example, we can specify the number of digits to display for floating-point numbers using the float_format parameter:

df.to_csv('data.csv', index=False, float_format='%.2f')

This will format all floating-point numbers to display 2 digits after the decimal point.

We can also specify the format of the date and time fields using the date_format and datetime_format parameters, respectively:

df.to_csv('data.csv', index=False, date_format='%Y-%m-%d', datetime_format='%Y-%m-%d %H:%M:%S')

This will format date fields in YYYY-MM-DD format, and datetime fields in YYYY-MM-DD HH:MM:SS format.

Handling missing values

By default, missing values (i.e. NaN, NaT) are exported as empty strings in CSV files. We can change this behavior using the na_rep parameter:

df.to_csv('data.csv', index=False, na_rep='N/A')

This will export all missing values as the string 'N/A'.

Controlling the output encoding

Pandas uses the UTF-8 encoding by default when writing to CSV files. We can change this encoding using the encoding parameter:

df.to_csv('data.csv', index=False, encoding='latin1')

This will use the ISO-8859-1 (also known as Latin-1) encoding instead of UTF-8.

Other options

There are several other options that can be passed to the to_csv() function to customize the CSV export behavior. Some of these options include:

sep: The delimiter character to use (default is ,)
quotechar: The quote character to use (default is "). Note that quotes are used to enclose text fields that contain the delimiter character.
header: Whether to write the column names to the CSV file (default is True)
line_terminator: The line terminator character to use (default is \n)
decimal: The character to use as the decimal separator (default is .)

Summary

In this article, we explored how to read data from CSV and Excel files as well as JSON files into a Pandas dataframe. We then looked at how to export data from a Pandas dataframe to a CSV file. Finally, we explored how to customize the CSV export behavior using various options.

If you are working with data in Python, then knowing how to read and write CSV files with Pandas is a must. It’s a simple and effective way to import and export data from and to various sources. By using the various options provided by Pandas, you can customize the output format to meet your needs.

One piece of advice when working with Pandas dataframes is to always be mindful of the data types of your columns. Pandas will try to infer the data type of each column when reading data from a file, but sometimes the inference may not be accurate. Make sure to double-check the data types of your columns and convert them if necessary before exporting the data to a CSV file.