· Visualisations · 3 min read
Bubble Plot Visualization with Seaborn in Python
Introduction to Bubble Plot with Seaborn
Bubble plots are a popular method in data visualization that allow for the representation of data using circles or “bubbles” of different sizes and colors. This type of plot can be particularly useful for visualizing large datasets, by facilitating the identification of trends or relationships between variables. In this article, we will explore how to create bubble plots using Seaborn, a popular data visualization library in Python.
Properties and Parameters of Bubble Plots
Creating a bubble plot with Seaborn involves using the scatterplot()
function, which has several key parameters for generating a visually appealing and informative plot:
-
data
: The input dataset (usually in the form of a Pandas DataFrame). -
x
andy
: Variables from the dataset that define the horizontal and vertical axes. -
size
: A variable from the dataset that controls the size of the bubbles. This parameter is optional, but when used, it adds an additional dimension of information to the plot. -
hue
: A variable from the dataset that determines the color of the bubbles. This parameter is also optional, but it can be helpful for conveying additional information or differentiating between categories. -
sizes
: A tuple of two numbers specifying the minimum and maximum size of the bubbles, controlling the overall scaling of the plot. -
palette
: The color palette used for the hue levels. -
style
: Style to use for the plot (acceptable values include: “white”, “dark”, “whitegrid”, “darkgrid”, or “ticks”).
import seaborn as sns
# Data preparation
data = sns.load_dataset("your_dataset_here")
# Creating the bubble plot
sns.scatterplot(data=data,
x="variable_x",
y="variable_y",
size="variable_size",
hue="variable_hue",
sizes=(min_size, max_size),
palette="color_palette",
style="plot_style")
Simplified Real-Life Example
As an example, let’s consider a dataset containing information about various countries, such as GDP per capita, life expectancy, population, and region. We’ll visualize the relationship between GDP per capita and life expectancy, while also taking into account the population size and regional affiliations.
import seaborn as sns
# Load the dataset
data = sns.load_dataset("gapminder")
# Create the bubble plot
sns.scatterplot(data=data,
x="gdpPercap",
y="lifeExp",
size="pop",
hue="region",
sizes=(10, 200),
palette="Set2")
Complex Real-Life Example
For a more complex example, let’s analyze a dataset containing information about startup companies, including their valuations, funding rounds, market sectors, and geographical locations. We will visualize the relationship between startup valuation and funding rounds, while also considering market sector and geographical location.
import seaborn as sns
import pandas as pd
# Load the dataset and preprocess
data = pd.read_csv("startups.csv")
data["valuation_range"] = pd.cut(data["valuation"], bins=[0, 1000000, 2500000, 5000000, 10000000, float("inf")], labels=["<1M", "1M-2.5M", "2.5M-5M", "5M-10M", ">10M"])
# Create the bubble plot
plot = sns.scatterplot(data=data,
x="funding_rounds",
y="valuation",
size="employees",
hue="market_sector",
sizes=(20, 250),
palette="RdYlBu")
# Customize the plot
plot.set(xscale="log", yscale="log")
plot.set_xlabel("Number of Funding Rounds (log-scaled)")
plot.set_ylabel("Startup Valuation (log-scaled)")
plot.legend(title="Market Sector", loc="upper left", bbox_to_anchor=(1, 1))
Personal Tips
-
Use the
sizes
parameter wisely to strike a balance between visual clarity and conveying information. Avoid creating plots with too many overlapping and cluttered bubbles. -
Choose an appropriate color palette with easily distinguishable colors to clearly signify the categories or levels in the hue variable.
-
Customize your plots by adding labels, legends, and scales to provide context and clarity to the visualization.
-
Experiment with different style options and choose the one that complements your data visualization objective.
-
Always validate your visualizations by interpreting and cross-checking the observations to ensure accurate representation of the data.