· Pandas · 5 min read
Calculating Mean Value Using mean() Function in Pandas
What is the mean() function in Pandas?
The mean() function in pandas is a statistical method used to calculate the average value of a given dataset in Python. It is used to summarize a numerical dataset and understand its central tendency. The mean() function applies only to numerical data, making it an essential tool for data analysis.
To use the mean() function in Pandas, import the library using the import pandas as pd
command. Then, create a DataFrame or use an existing one to apply the function. Here are some examples:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
# Calculate the mean of the DataFrame
df.mean()
The above code will output the mean value of each column in the DataFrame.
The mean() function can also be applied to a specific column of a DataFrame. Here’s an example:
# Calculate the mean of column A in the DataFrame
df['A'].mean()
It is important to note that the mean() function excludes missing values by default. If you want to include them in your calculation, use the skipna parameter and set it to False.
# Calculate the mean of column A including missing values
df['A'].mean(skipna=False)
In conclusion, the mean() function in Pandas is a simple yet powerful tool to calculate the average of numerical data in Python. It can be applied to both DataFrames and specific columns, making it a versatile method for data analysis.
How to apply mean() function on a dataset?
Once you have imported the Pandas library and created a DataFrame of your dataset, applying the mean() function is a straightforward process.
To apply the mean() function on a dataset, simply call the function on the DataFrame, like this:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
# Calculate the mean of the DataFrame
df.mean()
The above code will output the mean of each column in the DataFrame.
You can also apply the mean() function to a specific column by calling the function on that specific column, like this:
# Calculate the mean of column A in the DataFrame
df['A'].mean()
This will output the mean of only the values in column A.
It is important to note that the mean() function excludes missing values by default. If you want to include them in your calculation, use the skipna parameter and set it to False.
# Calculate the mean of column A including missing values
df['A'].mean(skipna=False)
In summary, to apply the mean() function on a dataset in Pandas, simply call the function on the DataFrame or specific column you want to calculate the mean on. Keep in mind that the function excludes missing values by default, but you can include them by setting the skipna parameter to False.
Examples of mean() function in Pandas
Here are a few examples of how to use the mean() function in Pandas for different use cases:
Example 1: Finding Mean of a Numeric Column
import pandas as pd
# Create a sample DataFrame
data = {'student_name': ['Alice', 'Bob', 'Cathy', 'David'],
'math_score': [90, 80, 95, 70],
'english_score': [85, 75, 80, 90]}
df = pd.DataFrame(data)
# Calculate the mean of math_score column
math_mean = df['math_score'].mean()
print(f"The mean math score is {math_mean}")
Output:
The mean math score is 83.75
Example 2: Finding Mean of Multiple Numeric Columns
import pandas as pd
# Create a sample DataFrame
data = {'student_name': ['Alice', 'Bob', 'Cathy', 'David'],
'math_score': [90, 80, 95, 70],
'english_score': [85, 75, 80, 90]}
df = pd.DataFrame(data)
# Calculate the mean of both math_score and english_score columns
mean_scores = df[['math_score', 'english_score']].mean()
print(f"The mean math and english scores are:\n{mean_scores}")
Output:
The mean math and english scores are:
math_score 83.75
english_score 82.50
dtype: float64
Example 3: Finding Mean of Rows in a DataFrame
import pandas as pd
# Create a sample DataFrame
data = {'student_name': ['Alice', 'Bob', 'Cathy', 'David'],
'math_score': [90, 80, 95, 70],
'english_score': [85, 75, 80, 90]}
df = pd.DataFrame(data)
# Add a new row with mean scores of each student
df.loc['Mean'] = df.mean()
print(df)
Output:
student_name math_score english_score
0 Alice 90.0 85.0
1 Bob 80.0 75.0
2 Cathy 95.0 80.0
3 David 70.0 90.0
Mean NaN 83.75 82.50
In summary, the mean() function in Pandas is a powerful tool for calculating the average value of a given dataset. It can be applied to specific columns or entire DataFrames, making it a versatile method for data analysis.
Summary
In summary, the mean() function in Pandas is an essential tool for data analysis in Python. It provides a convenient way to calculate the average value of a dataset and understand its central tendency. Whether you are working with large datasets or small ones, the mean() function is a versatile method that can be applied to both entire DataFrames and specific columns. To make the most of this function, ensure that your data is formatted correctly, and that you understand how to customize the function’s parameters to suit your needs. With these skills, you will be better equipped to analyze your data effectively and make informed decisions based on your findings.