· Visualisations · 4 min read

Creating and Customizing Heatmaps with Seaborn Python

Introduction to Heatmaps and Seaborn

Heatmaps are an effective way to visualize large datasets and discover patterns in the data. They represent values using a color spectrum, enabling a quick visual overview of complex datasets. In this article, we’ll dive into the Seaborn library, a powerful Python visualization library built on top of Matplotlib, to create and customize heatmaps.

Properties and Parameters in Seaborn Heatmaps

Seaborn provides a heatmap() function, which makes it easy to generate heatmaps. Let’s look at the key properties and parameters you should be aware of when creating heatmaps:

  1. Data: The dataset you want to visualize. It should be in a rectangular format, like a Pandas DataFrame or a NumPy array.

  2. Cmap: This is the color map used to represent the data values in the heatmap. Seaborn supports a variety of color maps, such as ‘viridis’, ‘magma’, and ‘coolwarm’.

  3. Annot: Set this to True if you want to include the data values in each cell of the heatmap. Otherwise, it will only show the cell colors.

  4. Fmt: This parameter is used to set the format of the annotations if annot is set to True. For example, you can use “%.1f” for displaying values with one decimal place.

  5. Cbar: A boolean parameter to display or hide a color bar next to the heatmap that shows the mapping between values and colors.

  6. Cbar_kws: A dictionary containing additional parameters for customizing the color bar, such as “label”.

  7. Square: Set this to True if you want to ensure each cell of the heatmap is a square.

  8. Lw: The linewidth of the lines separating the cells.

Simplified Real-Life Example

Let’s start with a basic example to showcase how to create a heatmap using Seaborn. We’ll create a heatmap of a correlation matrix for a simple dataset.

import seaborn as sns
import numpy as np
import pandas as pd

# Sample dataset
data = {'A': [1, 2, 3, 4],
        'B': [3, 4, 1, 2],
        'C': [2, 3, 4, 1],
        'D': [4, 1, 2, 3]}

df = pd.DataFrame(data)

# Calculate correlation matrix
corr_matrix = df.corr()

# Create the heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')

In this example, we first import the required libraries and create a sample dataset in a Pandas DataFrame. Then we calculate the correlation matrix and create the heatmap using the Seaborn heatmap() function with annotations and the ‘coolwarm’ color map.

Complex Real-Life Example

Now, let’s look at a more complex example using a real-world dataset. We’ll use the Titanic dataset, which contains information about the passengers aboard the Titanic and their survival. We’ll explore the relationships between various features and their impact on the survival rates.

import seaborn as sns
import pandas as pd

# Load Titanic dataset from Seaborn
titanic = sns.load_dataset('titanic')

# Clean the dataset
titanic.drop(['embark_town', 'class', 'who', 'adult_male', 'deck', 'alive', 'alone'], axis=1, inplace=True)
titanic['embarked'].fillna(titanic['embarked'].mode()[0], inplace=True)
titanic['age'].fillna(titanic['age'].median(), inplace=True)
titanic['embarked'] = titanic['embarked'].astype('category').cat.codes
titanic['sex'] = titanic['sex'].astype('category').cat.codes

# Calculate correlation matrix
corr_matrix = titanic.corr()

# Create the heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', cbar_kws={'label': 'Correlation'})

In this example, we first load the Titanic dataset using Seaborn and clean the dataset by dropping unnecessary columns, filling missing values, and converting categorical variables to numerical. Then, we calculate the correlation matrix of the cleaned dataset and create a heatmap with annotations and a color bar showing the correlation scale.

Personal Tips

  1. Select an appropriate color map for your heatmap; it should be easily interpretable and visually appealing. Sequential color maps like ‘viridis’ work well for positively correlated data, while diverging color maps like ‘coolwarm’ are suitable for data with positive and negative correlations.

  2. Always consider the size and aspect ratio of the heatmap to ensure it is readable and clear, especially when working with large datasets.

  3. Be cautious when interpreting heatmaps with large datasets, as the color spectrum might not highlight minor differences effectively.

  4. Experiment with different parameters and customizations to make the heatmap more informative and visually appealing.

In conclusion, heatmaps are a powerful tool for visualizing complex data and finding patterns. Seaborn makes it easy to create and customize heatmaps in Python, offering a high degree of flexibility to suit various datasets and use cases.