· Pandas · 5 min read
Pandas: Exporting Data to Excel.
Preparing Your Dataframe
Before we start exporting the DataFrame to Excel, we need to ensure that our DataFrame is well-prepared. This includes cleaning the data, renaming columns, modifying columns, and adding new columns if necessary.
Let’s start by creating a sample DataFrame:
import pandas as pd
data = {'Name':['John', 'Smith', 'Alex', 'Mike'], 'Age':[35, 42, 27, 29], 'Gender':['M', 'M', 'M', 'M']}
df = pd.DataFrame(data)
Here, we have created a DataFrame with three columns: Name, Age, and Gender. Now, let’s say we want to rename the Gender column to Sex. Pandas provides the rename()
method for this purpose. We can use the columns
parameter to specify the old column name and the new column name.
df.rename(columns={'Gender':'Sex'}, inplace=True)
Next, let’s say we want to add a new column called Profession to our DataFrame. We can create a list of values for the new column and then use the insert()
method to add it to the DataFrame.
professions = ['Doctor', 'Engineer', 'Teacher', 'Lawyer']
df.insert(2, 'Profession', professions)
Now our DataFrame has four columns: Name, Age, Profession, and Sex.
We can further modify our DataFrame by dropping columns that we don’t need. For example, let’s say we want to drop the Age column:
df.drop(['Age'], axis=1, inplace=True)
Finally, we need to ensure that our DataFrame does not have any missing or null values. We can use the dropna()
method to remove any rows with null values.
df.dropna(inplace=True)
By following these simple steps, we can easily prepare our DataFrame for exporting to Excel using pandas.
Using Pandas to Export to Excel
Exporting a pandas DataFrame to an Excel file is a very common operation that we might need to perform in data analysis. Pandas provides the to_excel()
method that allows us to export our DataFrame to an Excel file.
First, we need to create a pandas DataFrame with our desired data:
import pandas as pd
data = {'Name':['John', 'Smith', 'Alex', 'Mike'], 'Sex':['M', 'M', 'M', 'M'], 'Profession':['Doctor', 'Engineer', 'Teacher', 'Lawyer']}
df = pd.DataFrame(data)
Next, we can use the to_excel()
method to export the DataFrame to an Excel file:
df.to_excel("my_data.xlsx")
By default, the to_excel()
method writes the data to the first sheet of the Excel file. We can specify the specific sheet by using the sheet_name
parameter:
df.to_excel("my_data.xlsx", sheet_name="Data")
We can also specify the index to include in the exported file using the index
parameter. For example, if we don’t want to include the index in the exported file, we can set index=False
.
df.to_excel("my_data.xlsx", sheet_name="Data", index=False)
Moreover, we can format the Excel file by using the openpyxl
library. For example, we can set the font style, font size, and alignment of the exported file.
from openpyxl.styles import Font, Alignment
# Create a workbook object
writer = pd.ExcelWriter("my_data.xlsx", engine='openpyxl')
# Write the DataFrame to the Excel file
df.to_excel(writer, sheet_name="Data", index=False)
# Get the worksheet by name
worksheet = writer.sheets['Data']
# Set the font style and font size for the header row
font = Font(name='Arial', size=14, bold=True)
for cell in worksheet['A1:C1']:
for c in cell:
c.font = font
# Set the vertical and horizontal alignment for all cells
alignment = Alignment(horizontal='center', vertical='center')
for row in worksheet.iter_rows():
for cell in row:
cell.alignment = alignment
# Save the workbook object
writer.save()
By following these simple steps, we can easily export our pandas DataFrame to an Excel file and format the exported file as required using openpyxl.
Customizing Your Excel Output
When exporting a DataFrame to Excel, we might want to customize how our data appears in the Excel file. Pandas provides several options for customizing our Excel output, including formatting options and conditional formatting. In this section, we’ll go through some examples.
Formatting Options
We can customize the formatting of our Excel file using the to_excel()
method. For example, we can specify the number of decimal places to include for our numerical data:
df.to_excel("my_data.xlsx", sheet_name="Data", float_format="%.2f")
This will set the number of decimal places to 2 for our float columns.
We can also specify the format of our date columns using the date_format
parameter. For example:
df.to_excel("my_data.xlsx", sheet_name="Data", date_format="yyyy-mm-dd")
This will set the date format to year-month-day format for our date columns.
We can also set the Excel file properties such as title, author, and description using the ExcelWriter
object. For example:
from openpyxl import Workbook
# Create a Workbook object
workbook = Workbook()
# Set the Excel file properties
workbook.properties.title = "My Data"
workbook.properties.author = "John Smith"
workbook.properties.description = "Sample data export from Python"
# Save the Workbook object
workbook.save("my_data.xlsx")
Conditional Formatting
Pandas allows us to apply conditional formatting to our Excel output using the Styler
object. For example, we can highlight cells that meet a certain criteria, such as values greater than a certain threshold:
def highlight_greater_than(series, threshold):
"""
Highlight the cell in green if the value is greater than the threshold
"""
return ['background-color: green' if x > threshold else '' for x in series]
# Apply the conditional formatting
df.style.apply(highlight_greater_than, threshold=50)
This will highlight cells in green if the value is greater than 50.
We can also use built-in styles provided by pandas, such as gradient styles:
# Apply the gradient style
df.style.background_gradient()
This will apply a gradient style to our data.
By using these customization options, we can tailor our Excel output to our specific needs and make our data easier to understand and interpret.
Summary
Exporting data from pandas DataFrame to Excel is made easy by using pandas’ to_excel()
method. However, before exporting, it is important to prepare the DataFrame by addressing any missing or null values and customizing our output using formatting and conditional formatting.
When exporting to Excel, we can also customize and format the resulting file to our preferences, such as setting the number of decimal places, using different date formats, and highlighting cells that meet a certain criteria.
Exporting data to Excel is an essential skill for data analysts and developers alike. Following the steps in this article and experimenting with the customization options can greatly improve the readability and understandability of the data presented in our Excel files.