· Pandas · 6 min read

Filter Data Subsets with Pandas isin() Function

Syntax and Parameters of isin() Function

isin() function in pandas is used to filter data subsets based on a specific condition. With isin(), we can select all rows that have a certain value or that belong to a specific group. In this section, we’ll explain the syntax and parameters of the isin() function so that you can start using it in your data analysis.

Syntax

Here is the basic syntax for isin() function:

df[df['column_name'].isin(list)]

Where:

  • df is the dataframe where you want to perform filtering.
  • column_name is the name of the column that you want to filter.
  • list is the list of values that you want to filter on.

Parameters

Let’s take a closer look at the parameters of the isin() function:

  • first parameter df: This parameter refers to the dataframe that you want to perform filtering on. The isin() function will apply on this dataframe.
  • second parameter column_name: This parameter refers to the column of the dataframe that you want to filter.
  • third parameter list: This parameter refers to the list of values that you want to filter on. The values in the list should match the data type of the column you are filtering on.

Code Examples

Suppose we have a sample dataframe with three columns, one for names, one for numbers and one for a string. Here is how we can apply isin() function to filter the rows based on a condition:

import pandas as pd

# Sample dataframe
df = pd.DataFrame({
      'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
      'Numbers': [100, 200, 300, 400, 500, 600],
      'Strings': ['xyz', 'lmn' ,'efg', 'xyz', 'klm', 'abc']
   })

# Filter all rows containing values from list
list = ['Adam','Denis','Bob']
df_filtered = df[df['Names'].isin(list)]
print(df_filtered)

Output:

   Names  Numbers Strings
0   Adam      100     xyz
1    Bob      200     lmn
3  Denis      400     xyz

In the above example, we have filtered the rows based on a list of names [‘Adam’,‘Denis’,‘Bob’]. It will only return the rows that have these names in the ‘Names’ column.

By using isin() function with filtering, we can effectively generate more concise code for filtering our dataframes.

Filtering Data Subsets with Multiple Criteria

In many cases, we want to filter data subsets using multiple criteria. With isin() function and pandas, we can easily achieve our desired data filtering results by using conditional statements combined with multiple isin() functions. In this section, we will demonstrate how to filter data with multiple criteria.

Suppose we have a sample dataframe with four columns, one for names, one for ages, one for occupations, and one for salary. Here is how we can use the isin() function to filter data subsets with multiple criteria:

import pandas as pd

# Sample dataframe
df = pd.DataFrame({
  'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
  'Ages': [25, 32, 18, 41, 28, 46],
  'Occupations': ['Engineer', 'Doctor', 'Analyst', 'Lawyer', 'Manager', 'CEO'],
  'Salary': [60000, 80000, 50000, 90000, 70000, 100000]
})

# Filter all rows meeting the multiple criteria
names_list = ['Bob', 'Eddy']
age_list = [32, 28]
df_filtered = df[df['Names'].isin(names_list) & df['Ages'].isin(age_list)]
print(df_filtered)

Output:

   Names  Ages   Occupations  Salary
1    Bob    32        Doctor    80,000
4   Eddy    28       Manager    70,000

With combining the isin() function with the & operator, we can filter data based on multiple criteria. In the above example, we have filtered all records that have the name ‘Bob’ or ‘Eddy’ within the ‘Names’ column, as well as the age 32 or 28 in the ‘Ages’ column.

We can also filter data based on multiple criteria with different columns. Here is another example that filters data by occupation and salary:

import pandas as pd

# Sample dataframe
df = pd.DataFrame({
  'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
  'Ages': [25, 32, 18, 41, 28, 46],
  'Occupations': ['Engineer', 'Doctor', 'Analyst', 'Lawyer', 'Manager', 'CEO'],
  'Salary': [60000, 80000, 50000, 90000, 70000, 100000]
})

# Filter all rows meeting the multiple criteria
occupation_list = ['Engineer', 'Lawyer']
salary_list = [80000, 90000]
df_filtered = df[df['Occupations'].isin(occupation_list) & df['Salary'].isin(salary_list)]
print(df_filtered)

Output:

Names   Ages   Occupations   Salary
0    Adam     25     Engineer    60000
3   Denis     41       Lawyer    90000

In this example, we have filtered all rows in the dataframe where either of the occupations is Engineer or Lawyer and the salary should match either 80000 or 90000.

With these examples, we have demonstrated how to filter data subsets with multiple criteria. By combining conditional statements with isin() function, we can design and implement our data filtering requirements.

Handling Missing Values with isin() Function

It is common to have data with missing values when analyzing or preprocessing data. Pandas provides several methods to deal with missing values, including isin() function. In this section, we will explain how to use isin() function for handling missing values.

Missing values can be represented with many different formats, depending on the source of the data. In pandas, missing values are usually represented with NaN (Not a Number) values. We can use the isin() function to filter out the missing values and work only on the remaining data. Here is an example of how to use isin() function to filter out missing values:

import pandas as pd
import numpy as np

# Sample dataframe with missing values
df = pd.DataFrame({
  'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
  'Ages': [25, np.NaN, 18, 41, np.NaN, 46],
  'Occupations': ['Engineer', 'Doctor', 'Analyst', 'Lawyer', 'Manager', 'CEO'],
  'Salary': [60000, 80000, np.NaN, 90000, 70000, 100000]
})

# Filter all rows where 'Ages' column is not missing
df_filtered = df[df['Ages'].notnull()]
print(df_filtered)

Output:

   Names  Ages   Occupations  Salary
0   Adam   25.0     Engineer  60000.0
2   Charlie  18.0      Analyst     NaN
3   Denis   41.0      Lawyer    90000.0
5   Frank   46.0      CEO       100000.0

In the above example, we use the notnull() function on the ‘Ages’ column to check which rows do not have any missing values. We store the filtered data into a new dataframe called ‘df_filtered’. The dataframe only contains the rows where ‘Ages’ column does not contain any missing values.

We can also use isin() function to fill in the missing values with a specific value. Here is an example of how to use isin() function to fill in the missing values with a specific value:

import pandas as pd
import numpy as np

# Sample dataframe with missing values
df = pd.DataFrame({
  'Names': ['Adam', 'Bob', 'Charlie', 'Denis', 'Eddy', 'Frank'],
  'Ages': [25, np.NaN, 18, 41, np.NaN, 46],
  'Occupations': ['Engineer', 'Doctor', 'Analyst', 'Lawyer', 'Manager', 'CEO'],
  'Salary': [60000, np.NaN, np.NaN, 90000, 70000, np.NaN]
})

# Replace NaN values with a specific value
df['Salary'] = df['Salary'].fillna(value=75000)

print(df)

Output:

   Names  Ages   Occupations  Salary
0   Adam   25.0     Engineer  60000.0
1   Bob    NaN   Doctor      75000.0
2   Charlie  18.0      Analyst   75000.0
3   Denis   41.0      Lawyer    90000.0
4   Eddy   NaN    Manager     70000.0
5   Frank   46.0      CEO       75000.0

In the above example, we use the fillna() function to fill in the missing values in ‘Salary’ column with a value of 75000.

Using isin() function can be a useful way to deal with missing values when analyzing or preprocessing data. With isin() function, we can easily filter out or fill in missing values in a dataframe.

Summary

Learn how you can filter data subsets using Pandas isin() function. This blog post covers how to use the isin() function for filtering data, including syntax, parameters & code examples. We’ll dive deeper into handling missing values using isin() function and using it for filtering data with multiple criteria. Dealing with missing values is a common challenge when analyzing data, and isin() function can be an effective tool for preprocessing and filtering data subsets. Use these tips to streamline your data analysis process and make it more efficient.