· Pandas · 4 min read
Pandas - Row selection techniques
Using Loc and ILoc for Row Selection
To dive deep into selecting rows using loc and iloc in Pandas, let’s first understand their basic usage.
Using loc for row selection
loc
focuses on label-based indexing. It selects rows and columns by their labels. Here’s an example of how to use loc
:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charles'],
'Age': [25, 30, 29],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
# Select the row with the label 1
row = df.loc[1]
print(row)
In this example, df.loc[1]
returns the row with the label 1, showing Bob’s data.
Using iloc for row selection
iloc
, on the other hand, works with integer-based indexing, allowing you to select rows and columns by their integer positions. Here’s an example of using iloc
:
# Select the row with the integer position 0
row = df.iloc[0]
print(row)
In this case, df.iloc[0]
returns the row with the integer position 0, showing Alice’s data.
Using slices in loc and iloc
Both loc
and iloc
support slicing, allowing you to select multiple rows. Here’s an example:
# Select rows with labels 0 to 1 (inclusive) using loc
rows_loc = df.loc[0:1]
# Select rows with integer positions 0 to 1 (exclusive) using iloc
rows_iloc = df.iloc[0:1]
Notice that the slice in loc
is inclusive, while in iloc
, it’s exclusive.
By understanding the differences and appropriate usage of loc
and iloc
, you can efficiently select rows in a Pandas DataFrame based on your needs.
Filtering Rows Based on Conditions
Filtering rows based on conditions is a common requirement in data manipulation, and Pandas makes it quite efficient. Let’s see how we can achieve this with two primary methods:
Using boolean indexing
Boolean indexing allows us to apply conditions on columns and select specific rows that satisfy those conditions. Here’s an example:
# Filter rows where Age is greater than 25
filtered_rows = df[df['Age'] > 25]
print(filtered_rows)
This code snippet outputs the rows where the ‘Age’ column has a value greater than 25.
Using the query method
Another way to filter rows based on conditions is by using the query()
method. It’s an efficient and readable option, especially when dealing with complex conditions. Here’s an example:
# Filter rows where Age is greater than 25 and City is 'New York'
filtered_rows = df.query('Age > 25 & City == "New York"')
print(filtered_rows)
The query()
method accepts a string containing the conditions, making it easy to write and read complex filtering expressions.
By understanding and adapting these filtering techniques, you can effectively select specific rows from your Pandas DataFrame, based on certain conditions.
Selecting Rows with Query Function
The query()
function in Pandas allows you to select rows from a DataFrame based on conditions written as a string expression. It provides a concise and readable syntax for filtering data. Let’s dive into some examples to understand its usage:
Basic filtering
Suppose we want to filter rows where the ‘Age’ column has a value greater than 25:
filtered_rows = df.query('Age > 25')
print(filtered_rows)
The query()
function takes the condition ('Age > 25'
) in a string format and returns a DataFrame with the rows that satisfy the condition.
Combining multiple conditions
You can also filter rows based on multiple conditions using the logical operators &
(and), |
(or), and ~
(not). Here’s an example of filtering rows where the ‘Age’ column has a value greater than 25, and the ‘City’ column is ‘New York’:
filtered_rows = df.query('Age > 25 & City == "New York"')
print(filtered_rows)
Using variables in the query
If you have a variable with the value to filter, you can include it in the query using the @
symbol. For example:
min_age = 25
selected_city = "New York"
filtered_rows = df.query('Age > @min_age & City == @selected_city')
print(filtered_rows)
This allows for better flexibility and reusability of code when filtering rows based on conditions.
By mastering the use of the query()
function, you can make your Pandas DataFrame manipulation more efficient and maintainable.
Summary
In summary, selecting rows in a Pandas DataFrame is a crucial skill for any data manipulator. Using loc and iloc is quite effective for labeled and integer-based indexing, respectively. To filter rows based on certain conditions, boolean indexing and the query() function are powerful tools. Personally, I recommend mastering the query() function for its concise and readable syntax, especially when working with complex filtering conditions. Don’t forget to consider the performance implications when selecting large datasets, and always explore ways to optimize your code for better efficiency. Happy coding!