How to Fill NaN Values in Pandas with Value?
Using .fillna() method to fill NaN values
One of the best ways to fill in NaN (Not a Number) values in a Pandas DataFrame is to use the .fillna()
method. The .fillna()
method allows you to fill in the missing data with a single value or by using a custom function to determine the value.
The basic syntax for using the .fillna()
method is as follows:
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Below are some examples of using the .fillna()
method to fill in missing data with different parameters:
Example 1: Fill NaN values with a single value
import pandas as pd
data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
'Age': [25, 47, None, None, 34],
'Height': [174.5, 180.3, 167.8, None, 162.1],
'Score': [80, 92, 88, 75, None]}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
In the above example, all NaN values in the DataFrame are filled with the value 0.
Example 2: Fill NaN values with Forward Fill method
import pandas as pd
data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
'Age': [25, 47, None, 28, 34],
'Height': [174.5, 180.3, 167.8, None, 162.1],
'Score': [80, 92, 88, None, None]}
df = pd.DataFrame(data)
df.fillna(method='ffill', inplace=True)
In this example, the Forward Fill method is used to fill in the missing data. The method propagates the last valid observation forward to fill the missing data.
Example 3: Fill NaN values with Custom Function
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
'Age': [25, 47, None, None, 34],
'Height': [174.5, 180.3, 167.8, None, 162.1],
'Score': [80, 92, 88, 75, None]}
df = pd.DataFrame(data)
def fill_avg(column):
return column.fillna(np.mean(column))
df['Age'] = fill_avg(df['Age'])
df['Height'] = fill_avg(df['Height'])
df['Score'] = fill_avg(df['Score'])
In the above example, a custom function fill_avg()
is created to fill in the missing data with the mean of the column. The function is then applied to the relevant columns.
Using .replace() method for filling NaN values
Another method for filling in NaN values in a Pandas DataFrame is to use the .replace()
method. The method is particularly useful when there are specific values that are missing and need to be replaced with a specific value.
The basic syntax for using the .replace()
method is as follows:
DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
Below are some examples of using the .replace()
method to fill in missing data with different parameters:
Example 1: Replace a specific NaN value with a single value
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
'Age': [25, 47, np.NaN, 30, 34],
'Height': [174.5, 180.3, 167.8, 155.2, 162.1],
'Score': [80, None, 88, 75, None]}
df = pd.DataFrame(data)
df.replace(np.NaN, -1, inplace=True)
In the above example, a specific NaN value in the DataFrame is replaced with the value -1.
Example 2: Replace multiple NaN values with a single value
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
'Age': [25, 47, np.NaN, 30, np.NaN],
'Height': [174.5, np.NaN, 167.8, 155.2, 162.1],
'Score': [80, np.NaN, 88, 75, None]}
df = pd.DataFrame(data)
df.replace(np.NaN, -1, inplace=True)
In this example, multiple NaN values in the DataFrame are replaced with the value -1.
Example 3: Replace NaN values with a calculated value
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
'Age': [25, 47, np.NaN, 30, 34],
'Height': [174.5, 180.3, 167.8, 155.2, 162.1],
'Score': [80, None, 88, 75, None]}
df = pd.DataFrame(data)
df.replace(np.NaN, np.mean(df['Age']), inplace=True)
In this example, NaN values in the Age
column of the DataFrame are replaced with the mean value of the column.
Using .interpolate() method for NaN values
Another powerful method to fill in the missing data in a Pandas DataFrame is to use the .interpolate()
method. The method is particularly useful for time series data and can fill in the missing data using intermediate values.
The basic syntax for using the .interpolate()
method is as follows:
DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)
Below are some examples of using the .interpolate()
method to fill in missing data with different parameters:
Example 1: Interpolating missing values using Linear Method
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
'Age': [25, 47, np.NaN, 30, 34],
'Height': [174.5, 180.3, 167.8, 155.2, 162.1],
'Score': [80, 85, np.NaN, 75, 82]}
df = pd.DataFrame(data)
df.interpolate(method='linear', inplace=True)
In the above example, the Linear method is used to fill in the missing data in the DataFrame using intermediate values.
Example 2: Interpolating missing values using Time Method
import pandas as pd
import numpy as np
data = {'Value': [1, 2, np.NaN, 4, 5],
'Time': pd.date_range('20210101', periods=5)}
df = pd.DataFrame(data)
df.interpolate(method='time', inplace=True)
In this example, the Time method is used to fill in the missing data in the DataFrame, where the missing data is assumed to be at different times.
Example 3: Setting Limit for Interpolation
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Peter', 'Amy', 'David', 'Linda'],
'Age': [25, 47, np.NaN, 30, 34],
'Height': [174.5, 180.3, 167.8, 155.2, 162.1],
'Score': [np.NaN, 85, 88, 75, np.NaN]}
df = pd.DataFrame(data)
df.interpolate(method='linear', limit=1, inplace=True)
In this example, a limit of 1 is set for interpolation, meaning only one missing value in a row can be interpolated.
Summary
In this article, we explored different methods to fill in the missing data in a Pandas DataFrame. We discussed how to use the .fillna()
method to fill NaN values with a single value, using forward fill method and a custom function. We also looked at how to use .replace()
method to replace NaN values with a specific value or a calculated value. Finally, we examined the .interpolate()
method to fill in missing data using intermediate values. The choice of method to fill in missing data largely depends on the nature of the data and the goals of your analysis. Be sure to use the appropriate method and parameters that suit your needs.
Related Posts
-
The Ultimate Python Pandas Guide
By: Adam RichardsonIn this ultimate guide, you will learn how to use Pandas to perform various data manipulation tasks, such as cleaning, filtering, sorting and aggregating data.
-
A Step-by-Step Guide to Joining Pandas DataFrames
By: Adam RichardsonLearn how to join pandas DataFrames efficiently with this step-by-step guide. Improve your data analysis skills and optimize your workflow today!
-
Appending DataFrames in Pandas: A Tutorial
By: Adam RichardsonLearn how to combine two DataFrames in Pandas using the Append function. This tutorial will guide you on how to join multiple DataFrames with code examples.
-
Calculating Mean Value Using mean() Function in Pandas
By: Adam RichardsonLearn how to use the mean() function in pandas to calculate the mean value of a dataset in Python. Improve your data analysis skills with this tutorial.