· Pandas · 6 min read
Filter Data Subsets with Pandas isin() Function
Syntax and Parameters of isin() Function
isin() function in pandas is used to filter data subsets based on a specific condition. With isin(), we can select all rows that have a certain value or that belong to a specific group. In this section, we’ll explain the syntax and parameters of the isin() function so that you can start using it in your data analysis.
Syntax
Here is the basic syntax for isin() function:
df[df['column_name'].isin(list)]
Where:
- df is the dataframe where you want to perform filtering.
- column_name is the name of the column that you want to filter.
- list is the list of values that you want to filter on.
Parameters
Let’s take a closer look at the parameters of the isin() function:
- first parameter df: This parameter refers to the dataframe that you want to perform filtering on. The isin() function will apply on this dataframe.
- second parameter column_name: This parameter refers to the column of the dataframe that you want to filter.
- third parameter list: This parameter refers to the list of values that you want to filter on. The values in the list should match the data type of the column you are filtering on.
Code Examples
Suppose we have a sample dataframe with three columns, one for names, one for numbers and one for a string. Here is how we can apply isin() function to filter the rows based on a condition:
import pandas as pd
# Sample dataframe
df = pd.DataFrame({
'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
'Numbers': [100, 200, 300, 400, 500, 600],
'Strings': ['xyz', 'lmn' ,'efg', 'xyz', 'klm', 'abc']
})
# Filter all rows containing values from list
list = ['Adam','Denis','Bob']
df_filtered = df[df['Names'].isin(list)]
print(df_filtered)
Output:
Names Numbers Strings
0 Adam 100 xyz
1 Bob 200 lmn
3 Denis 400 xyz
In the above example, we have filtered the rows based on a list of names [‘Adam’,‘Denis’,‘Bob’]. It will only return the rows that have these names in the ‘Names’ column.
By using isin() function with filtering, we can effectively generate more concise code for filtering our dataframes.
Filtering Data Subsets with Multiple Criteria
In many cases, we want to filter data subsets using multiple criteria. With isin() function and pandas, we can easily achieve our desired data filtering results by using conditional statements combined with multiple isin() functions. In this section, we will demonstrate how to filter data with multiple criteria.
Suppose we have a sample dataframe with four columns, one for names, one for ages, one for occupations, and one for salary. Here is how we can use the isin() function to filter data subsets with multiple criteria:
import pandas as pd
# Sample dataframe
df = pd.DataFrame({
'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
'Ages': [25, 32, 18, 41, 28, 46],
'Occupations': ['Engineer', 'Doctor', 'Analyst', 'Lawyer', 'Manager', 'CEO'],
'Salary': [60000, 80000, 50000, 90000, 70000, 100000]
})
# Filter all rows meeting the multiple criteria
names_list = ['Bob', 'Eddy']
age_list = [32, 28]
df_filtered = df[df['Names'].isin(names_list) & df['Ages'].isin(age_list)]
print(df_filtered)
Output:
Names Ages Occupations Salary
1 Bob 32 Doctor 80,000
4 Eddy 28 Manager 70,000
With combining the isin() function with the & operator, we can filter data based on multiple criteria. In the above example, we have filtered all records that have the name ‘Bob’ or ‘Eddy’ within the ‘Names’ column, as well as the age 32 or 28 in the ‘Ages’ column.
We can also filter data based on multiple criteria with different columns. Here is another example that filters data by occupation and salary:
import pandas as pd
# Sample dataframe
df = pd.DataFrame({
'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
'Ages': [25, 32, 18, 41, 28, 46],
'Occupations': ['Engineer', 'Doctor', 'Analyst', 'Lawyer', 'Manager', 'CEO'],
'Salary': [60000, 80000, 50000, 90000, 70000, 100000]
})
# Filter all rows meeting the multiple criteria
occupation_list = ['Engineer', 'Lawyer']
salary_list = [80000, 90000]
df_filtered = df[df['Occupations'].isin(occupation_list) & df['Salary'].isin(salary_list)]
print(df_filtered)
Output:
Names Ages Occupations Salary
0 Adam 25 Engineer 60000
3 Denis 41 Lawyer 90000
In this example, we have filtered all rows in the dataframe where either of the occupations is Engineer or Lawyer and the salary should match either 80000 or 90000.
With these examples, we have demonstrated how to filter data subsets with multiple criteria. By combining conditional statements with isin() function, we can design and implement our data filtering requirements.
Handling Missing Values with isin() Function
It is common to have data with missing values when analyzing or preprocessing data. Pandas provides several methods to deal with missing values, including isin() function. In this section, we will explain how to use isin() function for handling missing values.
Missing values can be represented with many different formats, depending on the source of the data. In pandas, missing values are usually represented with NaN (Not a Number) values. We can use the isin() function to filter out the missing values and work only on the remaining data. Here is an example of how to use isin() function to filter out missing values:
import pandas as pd
import numpy as np
# Sample dataframe with missing values
df = pd.DataFrame({
'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
'Ages': [25, np.NaN, 18, 41, np.NaN, 46],
'Occupations': ['Engineer', 'Doctor', 'Analyst', 'Lawyer', 'Manager', 'CEO'],
'Salary': [60000, 80000, np.NaN, 90000, 70000, 100000]
})
# Filter all rows where 'Ages' column is not missing
df_filtered = df[df['Ages'].notnull()]
print(df_filtered)
Output:
Names Ages Occupations Salary
0 Adam 25.0 Engineer 60000.0
2 Charlie 18.0 Analyst NaN
3 Denis 41.0 Lawyer 90000.0
5 Frank 46.0 CEO 100000.0
In the above example, we use the notnull() function on the ‘Ages’ column to check which rows do not have any missing values. We store the filtered data into a new dataframe called ‘df_filtered’. The dataframe only contains the rows where ‘Ages’ column does not contain any missing values.
We can also use isin() function to fill in the missing values with a specific value. Here is an example of how to use isin() function to fill in the missing values with a specific value:
import pandas as pd
import numpy as np
# Sample dataframe with missing values
df = pd.DataFrame({
'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
'Ages': [25, np.NaN, 18, 41, np.NaN, 46],
'Occupations': ['Engineer', 'Doctor', 'Analyst', 'Lawyer', 'Manager', 'CEO'],
'Salary': [60000, np.NaN, np.NaN, 90000, 70000, np.NaN]
})
# Replace NaN values with a specific value
df['Salary'] = df['Salary'].fillna(value=75000)
print(df)
Output:
Names Ages Occupations Salary
0 Adam 25.0 Engineer 60000.0
1 Bob NaN Doctor 75000.0
2 Charlie 18.0 Analyst 75000.0
3 Denis 41.0 Lawyer 90000.0
4 Eddy NaN Manager 70000.0
5 Frank 46.0 CEO 75000.0
In the above example, we use the fillna() function to fill in the missing values in ‘Salary’ column with a value of 75000.
Using isin() function can be a useful way to deal with missing values when analyzing or preprocessing data. With isin() function, we can easily filter out or fill in missing values in a dataframe.
Summary
Learn how you can filter data subsets using Pandas isin() function. This blog post covers how to use the isin() function for filtering data, including syntax, parameters & code examples. We’ll dive deeper into handling missing values using isin() function and using it for filtering data with multiple criteria. Dealing with missing values is a common challenge when analyzing data, and isin() function can be an effective tool for preprocessing and filtering data subsets. Use these tips to streamline your data analysis process and make it more efficient.