· Visualisations · 4 min read
Creating Histograms with Seaborn in Python
Creating Histograms with Seaborn in Python
Introduction
Seaborn is a popular Python data visualization library that builds on top of Matplotlib, providing a high-level interface for creating statistical graphics. Histograms, in particular, are useful for understanding the distribution of a dataset. This article will demonstrate the process of creating histograms with Seaborn and how it can help you analyze and display data effectively.
Seaborn Properties and Parameters
To create histograms in Seaborn, we will be using the displot()
function that offers several options for customizing the plot. The key parameters to be aware of are:
- data: This is the input data in Pandas DataFrame or Numpy array format.
- bins: This determines the number of bins (rectangular bars) in the histogram. A higher number of bins will provide a more detailed view of the data distribution.
- kde: This stands for Kernel Density Estimation. It is a boolean parameter that decides whether to plot a smooth curve fitted to the data, helping us visualize the underlying probability density.
- rug: This is another boolean parameter that decides whether to draw small vertical ticks at each data point along the x-axis, providing a better understanding of data density.
- color: This parameter can be used to change the color of the histogram bars.
Depending on the dataset and the analysis objectives, you can adjust these parameters to obtain the desired visualization results.
Simplified Real-Life Example
Let’s assume you are working with a dataset containing the heights of individuals, and you would like to visualize the distribution of heights using a histogram. First, import the necessary libraries and load the data into a Pandas DataFrame.
import seaborn as sns
import pandas as pd
height_data = {"height": [160, 165, 167, 150, 178, 185, 173, 162, 175, 172]}
df = pd.DataFrame(height_data)
Now, let’s create a simple histogram using Seaborn with a custom number of bins and color.
sns.displot(data=df, x='height', bins=8, color='blue')
This will produce a histogram with eight bins, displaying the distribution of individuals’ heights in the dataset.
Complex Real-Life Example
In this more complex example, we will look at a dataset containing both the heights and weights of individuals and visualize these two variables using separate histograms in a single plot. First, import the libraries and load the data.
import seaborn as sns
import pandas as pd
data = {"height": [160, 165, 167, 150, 178, 185, 173, 162, 175, 172],
"weight": [50, 60, 70, 65, 80, 75, 85, 55, 68, 72]}
df = pd.DataFrame(data)
In this case, we will use the FacetGrid
feature in Seaborn to create a multi-plot grid for two histograms, one for height and one for weight.
g = sns.FacetGrid(df.melt(), col="variable", sharey=False, sharex=False)
g.map_dataframe(sns.histplot, x="value", bins=8, kde=True, color="red")
This will generate a grid with two plots, showing the distribution of both heights and weights in the dataset with a KDE curve overlaid on the histograms.
Personal Tips
-
Choosing the right number of bins: Keep in mind that selecting the appropriate number of bins plays a crucial role in representing the data distribution. Too few bins may oversimplify the data, while too many bins can result in a noisy visualization. A general rule of thumb is the “square root rule,” which suggests choosing around the square root of the number of data points as the number of bins.
-
Scaling the axis: In some cases, it might be useful to transform the scale of the axis, especially when working with skewed data. You can use Seaborn’s
set_xscale()
andset_yscale()
functions to change the scale to logarithmic, power-law, or any other custom scaling. -
Customizing the appearance: Seaborn has various built-in themes and color palettes that you can use to modify the appearance of your plots. Experiment with different styles to find the one that works best for your project.
-
Layering plots: Seaborn allows you to layer multiple plots in one figure. For instance, you could combine histograms with other plot types, such as scatter plots or box plots, to gain more insights from your data.
By understanding and utilizing these tips, you can create powerful and insightful histograms with Seaborn in Python to effectively analyze and display your data.