Renaming Columns in Python Pandas

We’re now going to look into renaming columns in pandas. To follow along, you can check out our previous post where we loaded in the Data.

Importing CSV files with Python Pandas

In this post, we will cover how to load data into a pandas dataframe from a csv file.

So we now have a pandas dataframe raw that looks like this.

date	est_ref	capacity	occupancy	rooms_sold	avg_rate_paid	sales_value
2022-12-27	0	289	0.75	217	35.97	7805.49
2022-12-27	1	203	0.35	71	82.31	5844.01
2022-12-27	2	207	0.51	106	227.83	24149.98
2022-12-27	3	27	0.37	10	126.46	1264.60
2022-12-27	4	20	0.87	17	191.57	3256.69

We don’t like the name of the columns, so we’re going to change them.

Renaming a single column

We’re not liking the look of the est_ref column. We believe this should be est_key. Est is meaning Establishment. This key column will link to another table in the future.

Let’s take a look at how we can rename this in pandas.

import pandas as pd

raw = pd.read_csv("sales.csv")

# Added this line of code
raw = raw.rename(columns={'est_ref': 'est_key'})

raw.head(5)

As you can see, we have added the line of code raw = raw.rename(columns={'est_ref': 'est_key'}). We are reassigning the variable raw which is currently a dataframe, to the result of running the raw.rename() function, which will return a new dataframe, with the renamed columns. The renaming is happening because of the raw.rename() function, which is possible on Pandas dataframes.

Rename Function

The rename() function is used to rename one or more columns in a pandas DataFrame.
To rename a single column, you can pass a dictionary to the columns parameter, where the keys are the old column names and the values are the new column names. This is what we did above. columns={'old_name': 'new_name'} is the syntax for doing this.
To rename multiple columns, you can pass a dictionary with multiple key-value pairs, where each key is an old column name and each value is the corresponding new column name.
The rename() function has an optional inplace parameter, which can be set to True to modify the DataFrame in place, or False (the default) to create a new DataFrame with the modified column names. Instead of reassigning and essentially overwriting the raw variable, we can just make the change with inplace=True and we don’t have to worry about that. Here is an example.

import pandas as pd

raw = pd.read_csv("sales.csv")

raw.rename(columns={'est_ref': 'est_key'}, inplace=True)

raw.head(5)

Renaming Multiple columns

If we wanted to rename multiple columns, we would simply pass more Key value pairs into the columns, which will be separated by commas.

Example

import pandas as pd

raw = pd.read_csv("sales.csv")

raw.rename(columns={'est_ref': 'est_key', 'avg_rate_paid': 'avg_rate'}, inplace=True)

raw.head(5)

Output

Your dataframe should now look like this

date	est_key	capacity	occupancy	rooms_sold	avg_rate	sales_value
2022-12-27	0	289	0.75	217	35.97	7805.49
2022-12-27	1	203	0.35	71	82.31	5844.01
2022-12-27	2	207	0.51	106	227.83	24149.98
2022-12-27	3	27	0.37	10	126.46	1264.60
2022-12-27	4	20	0.87	17	191.57	3256.69

Renaming All columns using a Lambda function

Now this is getting a little bit more complicated. Don’t worry too much about this at this stage, I just wanted to include it in the renaming part as it’s valuable, and ultimately I want the columns named this way as we work through this course. Lambda functions allows us to write Python code against all of our columns or rows of data within Pandas. This allows for some fun things to happen! We will get more into these later, but here is the first example.

Naming Conventions

As a bit of background, in programming we usually follow certain standards and best practices. This ensures that when working with larger teams, our code is consistent and we can continue to be productive when working with the code of others. With my background, my favourite way of naming variables, columns, functions etc is camelCase.

Popular naming conventions

camelCase: In camelCase, the first letter of each word (except the first word) is capitalized. For example, “salesValue” or “customerAddress.”
PascalCase: In PascalCase, the first letter of each word is capitalized. For example, “SalesValue” or “CustomerAddress.”
snake_case: In snake_case, words are separated by underscores. For example, “sales_value” or “customer_address.”

Changing column names to camelCase

I’m now going to show you the code for using a Lambda function against all of our columns, in order to change the names to camelCase.

import pandas as pd

raw = pd.read_csv("sales.csv")

raw.rename(columns={'est_ref': 'est_key', 'avg_rate_paid': 'avg_rate'}, inplace=True)

#This code is doing the conversion
raw.rename(columns=lambda x: x[0].lower() + x.strip().lower().replace('_', ' ').title().replace(' ', '')[1:], inplace=True)

raw.head(5)

Here’s a rundown of what’s happening here

Calling the raw.rename() function inplace
We take the first letter of the column x[0].lower() and lowercase it.
With the rest of the column name [1:] we are doing the following
- strip() to remove any whitespace
- lower() to make everything lowercase
- .replace() replaces all the snake case underscores _ with a space
- .title() capitalises the first letter of every word in a sentence, like you have in a title. We then replace the spaces ' ' with nothing ''

I know this may be a little bit confusing. Feel free to stick with snake_case if you prefer, or just run the code and we will pickup more advanced things like this later in the tutorial

Output

Your dataframe should now look like this.

date	estKey	capacity	occupancy	roomsSold	avgRate	salesValue
2022-12-27	0	289	0.75	217	35.97	7805.49
2022-12-27	1	203	0.35	71	82.31	5844.01
2022-12-27	2	207	0.51	106	227.83	24149.98
2022-12-27	3	27	0.37	10	126.46	1264.60
2022-12-27	4	20	0.87	17	191.57	3256.69

Conclusion

You should now be able to manually rename columns in a dataframe. When you are working with larger datasets though, sometimes it makes sense to do it programmatically. We’ve gone through an example of changing snake_case to camelCase in column names. Although this is an opinionated change and not strictly a useful one, this is an important concept. When getting data from third party systems, they often come with spaces in columns, weird capitalisation etc and being able to do this using a lambda function instead of manually renaming one by one is valuable.

Working with Python Pandas Data Types

Learn about the different data types in Pandas, the powerful Python library for data analysis. Discover how to work with numerical, categorical, and textual data, as well as dates and times.