· Pandas · 5 min read
Filtering Data with Pandas Query Function
Basic Syntax and Usage
Pandas query() function is a powerful tool for filtering data from pandas dataframes. The function filters a DataFrame by passing a Boolean expression. The syntax for the function is quite intuitive and easy to use:
DataFrame.query(expr, inplace=False, **kwargs)
Here, the expr
parameter should be a string containing the query logic. The inplace
parameter is optional and indicates whether or not the DataFrame should be modified in place, i.e., whether the query should overwrite the original DataFrame. You can also pass any additional kwargs that are relevant to the specific operation you’re performing.
To provide you with an example, let’s consider a DataFrame data
:
import numpy as np
import pandas as pd
data = pd.DataFrame({
'name': ['John', 'Eric', 'Michael'],
'age': [25, 30, 18],
'score': [90, 80, 95],
'pass': [True, False, True]
})
You can use the query() function to filter the information in the DataFrame as follows:
data.query('score > 80')
This would return a new DataFrame containing the rows where the score is greater than 80:
name age score pass
John 25 90 True
Michael 18 95 True
Alternatively, you can apply multiple filters to a DataFrame using the & and | operators. For example, to get the rows in which the score is greater than 80 and the person passed the test, you can use the following query:
data.query('score > 80 & pass == True')
This would return a DataFrame with the following rows:
name age score pass
John 25 90 True
Using the query() function can help you filter and manipulate data easily and effectively. In the next section, we will dive deeper into how to use the function with more complex filtering expressions.
Filtering with Logical Operators
As mentioned in the previous section, the query()
function in pandas allows you to apply multiple filters using the & and | logical operators. These operators filter records based on the conditions provided.
The & (AND) operator returns records where both conditions are true, whereas the | (OR) operator returns records where at least one of the conditions is true.
Let’s look at an example to understand how to use these operators. Consider the following DataFrame data
:
import pandas as pd
import numpy as np
data = pd.DataFrame({
'name': ['John', 'Eric', 'Michael'],
'age': [25, 30, 18],
'score': [90, 80, 95],
'pass': [True, False, True]
})
To select rows where the score is greater than 80 and the person has passed, you can use the & (AND) operator as follows:
result = data.query('score > 80 & pass == True')
This would filter the information in the DataFrame and return the following result:
name age score pass
0 John 25 90 True
You can also use the | (OR) operator to filter the DataFrame. To select rows where the score is greater than 80 or the person has passed, you can use the following query:
result = data.query('score > 80 | pass == True')
This would return all the rows that satisfy either one of these conditions:
name age score pass
0 John 25 90 True
2 Michael 18 95 True
By using the logical operators in Pandas query() function, you can filter data dynamically, apply multiple filters and extract the relevant information to analyze your data.
Filtering with Conditional Statements
In addition to the logical operators, you can also use other conditional statements to filter data in Pandas query() function.
Pandas query() function provides multiple conditional statements for filtering data, such as in
, not in
, like
, between
, isna
, and notna
.
Let’s look at an example to understand how to use these statements. Consider the following DataFrame data
:
import pandas as pd
import numpy as np
data = pd.DataFrame({
'name': ['John', 'Eric', 'Michael', 'Lucy'],
'age': [25, 30, 18, 22],
'score': [90, 80, 95, 85],
'pass': [True, False, True, True]
})
To select rows where the name is either ‘John’ or ‘Lucy’, you can use the in
statement as follows:
result = data.query("name in ['John', 'Lucy']")
This would filter the information in the DataFrame and return the following result:
name age score pass
0 John 25 90 True
3 Lucy 22 85 True
You can also use the not in
statement to filter the records that are not in the specified values. To select rows where the name is not either ‘John’ or ‘Lucy’, you can use the following query:
result = data.query("name not in ['John', 'Lucy']")
This would return the following result:
name age score pass
1 Eric 30 80 False
2 Michael 18 95 True
Similarly, you can use other conditional statements like like
, between
, isna
, and notna
to filter data and extract the relevant information to analyze your data.
Summary
Pandas query() function is a powerful tool that allows you to filter data from pandas dataframes efficiently. In this blog post, we covered the basics of using the query() function, and learned how to filter data using logical operators and conditional statements like in
, not in
, like
, between
, isna
, and notna
. With query() function, filtering data becomes more intuitive and easier to read. I hope this article helps you to improve your data analysis skills and make your data more manageable. Keep practicing and experimenting with the query() function to enhance your data filtering capabilities in Pandas.