· Pandas · 5 min read
Renaming Columns in Python Pandas
We’re now going to look into renaming columns in pandas. To follow along, you can check out our previous post where we loaded in the Data.
In this post, we will cover how to load data into a pandas dataframe from a csv file.
So we now have a pandas dataframe raw
that looks like this.
date | est_ref | capacity | occupancy | rooms_sold | avg_rate_paid | sales_value |
---|---|---|---|---|---|---|
2022-12-27 | 0 | 289 | 0.75 | 217 | 35.97 | 7805.49 |
2022-12-27 | 1 | 203 | 0.35 | 71 | 82.31 | 5844.01 |
2022-12-27 | 2 | 207 | 0.51 | 106 | 227.83 | 24149.98 |
2022-12-27 | 3 | 27 | 0.37 | 10 | 126.46 | 1264.60 |
2022-12-27 | 4 | 20 | 0.87 | 17 | 191.57 | 3256.69 |
We don’t like the name of the columns, so we’re going to change them.
Renaming a single column
We’re not liking the look of the est_ref
column. We believe this should be est_key
. Est is meaning Establishment. This key column will link to another table in the future.
Let’s take a look at how we can rename this in pandas.
import pandas as pd
raw = pd.read_csv("sales.csv")
# Added this line of code
raw = raw.rename(columns={'est_ref': 'est_key'})
raw.head(5)
As you can see, we have added the line of code raw = raw.rename(columns={'est_ref': 'est_key'})
. We are reassigning the variable raw
which is currently a dataframe, to the result of running the raw.rename() function, which will return a new dataframe, with the renamed columns. The renaming is happening because of the raw.rename()
function, which is possible on Pandas dataframes.
Rename Function
- The rename() function is used to rename one or more columns in a pandas DataFrame.
- To rename a single column, you can pass a dictionary to the columns parameter, where the keys are the old column names and the values are the new column names. This is what we did above.
columns={'old_name': 'new_name'}
is the syntax for doing this. - To rename multiple columns, you can pass a dictionary with multiple key-value pairs, where each key is an old column name and each value is the corresponding new column name.
- The rename() function has an optional inplace parameter, which can be set to True to modify the DataFrame in place, or False (the default) to create a new DataFrame with the modified column names. Instead of reassigning and essentially overwriting the raw variable, we can just make the change with
inplace=True
and we don’t have to worry about that. Here is an example.
import pandas as pd
raw = pd.read_csv("sales.csv")
raw.rename(columns={'est_ref': 'est_key'}, inplace=True)
raw.head(5)
Renaming Multiple columns
If we wanted to rename multiple columns, we would simply pass more Key value pairs into the columns, which will be separated by commas.
Example
import pandas as pd
raw = pd.read_csv("sales.csv")
raw.rename(columns={'est_ref': 'est_key', 'avg_rate_paid': 'avg_rate'}, inplace=True)
raw.head(5)
Output
Your dataframe should now look like this
date | est_key | capacity | occupancy | rooms_sold | avg_rate | sales_value |
---|---|---|---|---|---|---|
2022-12-27 | 0 | 289 | 0.75 | 217 | 35.97 | 7805.49 |
2022-12-27 | 1 | 203 | 0.35 | 71 | 82.31 | 5844.01 |
2022-12-27 | 2 | 207 | 0.51 | 106 | 227.83 | 24149.98 |
2022-12-27 | 3 | 27 | 0.37 | 10 | 126.46 | 1264.60 |
2022-12-27 | 4 | 20 | 0.87 | 17 | 191.57 | 3256.69 |
Renaming All columns using a Lambda function
Now this is getting a little bit more complicated. Don’t worry too much about this at this stage, I just wanted to include it in the renaming part as it’s valuable, and ultimately I want the columns named this way as we work through this course. Lambda functions allows us to write Python code against all of our columns or rows of data within Pandas. This allows for some fun things to happen! We will get more into these later, but here is the first example.
Naming Conventions
As a bit of background, in programming we usually follow certain standards and best practices. This ensures that when working with larger teams, our code is consistent and we can continue to be productive when working with the code of others. With my background, my favourite way of naming variables, columns, functions etc is camelCase.
Popular naming conventions
- camelCase: In camelCase, the first letter of each word (except the first word) is capitalized. For example, “salesValue” or “customerAddress.”
- PascalCase: In PascalCase, the first letter of each word is capitalized. For example, “SalesValue” or “CustomerAddress.”
- snake_case: In snake_case, words are separated by underscores. For example, “sales_value” or “customer_address.”
Changing column names to camelCase
I’m now going to show you the code for using a Lambda function against all of our columns, in order to change the names to camelCase.
import pandas as pd
raw = pd.read_csv("sales.csv")
raw.rename(columns={'est_ref': 'est_key', 'avg_rate_paid': 'avg_rate'}, inplace=True)
#This code is doing the conversion
raw.rename(columns=lambda x: x[0].lower() + x.strip().lower().replace('_', ' ').title().replace(' ', '')[1:], inplace=True)
raw.head(5)
Here’s a rundown of what’s happening here
- Calling the
raw.rename()
function inplace - We take the first letter of the column
x[0].lower()
and lowercase it. - With the rest of the column name
[1:]
we are doing the followingstrip()
to remove any whitespacelower()
to make everything lowercase.replace()
replaces all the snake case underscores_
with a space.title()
capitalises the first letter of every word in a sentence, like you have in a title. We then replace the spaces' '
with nothing''
I know this may be a little bit confusing. Feel free to stick with snake_case if you prefer, or just run the code and we will pickup more advanced things like this later in the tutorial
Output
Your dataframe should now look like this.
date | estKey | capacity | occupancy | roomsSold | avgRate | salesValue |
---|---|---|---|---|---|---|
2022-12-27 | 0 | 289 | 0.75 | 217 | 35.97 | 7805.49 |
2022-12-27 | 1 | 203 | 0.35 | 71 | 82.31 | 5844.01 |
2022-12-27 | 2 | 207 | 0.51 | 106 | 227.83 | 24149.98 |
2022-12-27 | 3 | 27 | 0.37 | 10 | 126.46 | 1264.60 |
2022-12-27 | 4 | 20 | 0.87 | 17 | 191.57 | 3256.69 |
Conclusion
You should now be able to manually rename columns in a dataframe. When you are working with larger datasets though, sometimes it makes sense to do it programmatically. We’ve gone through an example of changing snake_case to camelCase in column names. Although this is an opinionated change and not strictly a useful one, this is an important concept. When getting data from third party systems, they often come with spaces in columns, weird capitalisation etc and being able to do this using a lambda function instead of manually renaming one by one is valuable.
Learn about the different data types in Pandas, the powerful Python library for data analysis. Discover how to work with numerical, categorical, and textual data, as well as dates and times.