· Pandas · 5 min read

Filtering Data with Pandas Query Function

Basic Syntax and Usage

Pandas query() function is a powerful tool for filtering data from pandas dataframes. The function filters a DataFrame by passing a Boolean expression. The syntax for the function is quite intuitive and easy to use:

DataFrame.query(expr, inplace=False, **kwargs)

Here, the expr parameter should be a string containing the query logic. The inplace parameter is optional and indicates whether or not the DataFrame should be modified in place, i.e., whether the query should overwrite the original DataFrame. You can also pass any additional kwargs that are relevant to the specific operation you’re performing.

To provide you with an example, let’s consider a DataFrame data:

import numpy as np
import pandas as pd

data = pd.DataFrame({
    'name': ['John', 'Eric', 'Michael'],
    'age': [25, 30, 18],
    'score': [90, 80, 95],
    'pass': [True, False, True]
})

You can use the query() function to filter the information in the DataFrame as follows:

data.query('score > 80')

This would return a new DataFrame containing the rows where the score is greater than 80:

name      age     score    pass
John      25      90       True
Michael   18      95       True

Alternatively, you can apply multiple filters to a DataFrame using the & and | operators. For example, to get the rows in which the score is greater than 80 and the person passed the test, you can use the following query:

data.query('score > 80 & pass == True')

This would return a DataFrame with the following rows:

name     age      score    pass
John     25       90       True

Using the query() function can help you filter and manipulate data easily and effectively. In the next section, we will dive deeper into how to use the function with more complex filtering expressions.

Filtering with Logical Operators

As mentioned in the previous section, the query() function in pandas allows you to apply multiple filters using the & and | logical operators. These operators filter records based on the conditions provided.

The & (AND) operator returns records where both conditions are true, whereas the | (OR) operator returns records where at least one of the conditions is true.

Let’s look at an example to understand how to use these operators. Consider the following DataFrame data:

import pandas as pd
import numpy as np

data = pd.DataFrame({
    'name': ['John', 'Eric', 'Michael'],
    'age': [25, 30, 18],
    'score': [90, 80, 95],
    'pass': [True, False, True]
})

To select rows where the score is greater than 80 and the person has passed, you can use the & (AND) operator as follows:

result = data.query('score > 80 & pass == True')

This would filter the information in the DataFrame and return the following result:

  name   age  score  pass
0  John  25    90      True

You can also use the | (OR) operator to filter the DataFrame. To select rows where the score is greater than 80 or the person has passed, you can use the following query:

result = data.query('score > 80 | pass == True')

This would return all the rows that satisfy either one of these conditions:

      name  age   score  pass
0     John  25     90     True
2  Michael  18     95     True

By using the logical operators in Pandas query() function, you can filter data dynamically, apply multiple filters and extract the relevant information to analyze your data.

Filtering with Conditional Statements

In addition to the logical operators, you can also use other conditional statements to filter data in Pandas query() function.

Pandas query() function provides multiple conditional statements for filtering data, such as in, not in, like, between, isna, and notna.

Let’s look at an example to understand how to use these statements. Consider the following DataFrame data:

import pandas as pd
import numpy as np

data = pd.DataFrame({
    'name': ['John', 'Eric', 'Michael', 'Lucy'],
    'age': [25, 30, 18, 22],
    'score': [90, 80, 95, 85],
    'pass': [True, False, True, True]
})

To select rows where the name is either ‘John’ or ‘Lucy’, you can use the in statement as follows:

result = data.query("name in ['John', 'Lucy']")

This would filter the information in the DataFrame and return the following result:

   name  age  score  pass
0  John  25   90     True
3  Lucy  22   85     True

You can also use the not in statement to filter the records that are not in the specified values. To select rows where the name is not either ‘John’ or ‘Lucy’, you can use the following query:

result = data.query("name not in ['John', 'Lucy']")

This would return the following result:

     name  age  score  pass
1    Eric  30   80     False
2  Michael  18   95     True

Similarly, you can use other conditional statements like like, between, isna, and notna to filter data and extract the relevant information to analyze your data.

Summary

Pandas query() function is a powerful tool that allows you to filter data from pandas dataframes efficiently. In this blog post, we covered the basics of using the query() function, and learned how to filter data using logical operators and conditional statements like in, not in, like, between, isna, and notna. With query() function, filtering data becomes more intuitive and easier to read. I hope this article helps you to improve your data analysis skills and make your data more manageable. Keep practicing and experimenting with the query() function to enhance your data filtering capabilities in Pandas.