Pandas, a data manipulation library for Python, provides methods for
detecting and handling missing data. In this tutorial, we will cover the
isnull, notnull, dropna, and
fillna methods.
First, let’s create a sample DataFrame that contains some missing data.
# Import Pandas library
import pandas as pd
import numpy as np
# Create a DataFrame with missing data
data = {
'Name': ['Alice', 'Bob', None, 'David', 'Eve'],
'Age': [25, None, 35, 40, None],
'City': ['New York', 'Los Angeles', 'Boston', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)
df
## Name Age City
## 0 Alice 25.0 New York
## 1 Bob NaN Los Angeles
## 2 None 35.0 Boston
## 3 David 40.0 Houston
## 4 Eve NaN Phoenix
In Pandas, NaN (which stands for “Not a Number”) is the standard missing data marker used for floating-point numbers, while None is the Pythonic way to represent the absence of a value.
When you insert None into a column of data type float, Pandas will convert it to NaN. However, if you insert None into an object data type column (like strings), Pandas will leave it as None.
isnullThe isnull method returns a DataFrame where each entry
is a boolean value that indicates whether the corresponding data point
is missing.
# Detect missing values using isnull
missing_data = df.isnull()
missing_data
## Name Age City
## 0 False False False
## 1 False True False
## 2 True False False
## 3 False False False
## 4 False True False
notnullThe notnull method works in the opposite way to
isnull. It returns True where data is not
missing.
# Detect non-missing values using notnull
non_missing_data = df.notnull()
non_missing_data
## Name Age City
## 0 True True True
## 1 True False True
## 2 False True True
## 3 True True True
## 4 True False True
dropnaThe dropna method allows you to drop rows or columns
that contain missing data.
# Drop rows with missing data
dropped_rows = df.dropna()
dropped_rows
## Name Age City
## 0 Alice 25.0 New York
## 3 David 40.0 Houston
# Drop columns with missing data
dropped_columns = df.dropna(axis=1)
dropped_columns
## City
## 0 New York
## 1 Los Angeles
## 2 Boston
## 3 Houston
## 4 Phoenix
fillnaThe fillna method allows you to replace missing data
with a specific value or a method (like mean).
# Fill missing data with a specific value
filled_data = df.fillna("Unknown")
filled_data
## Name Age City
## 0 Alice 25 New York
## 1 Bob Unknown Los Angeles
## 2 Unknown 35 Boston
## 3 David 40 Houston
## 4 Eve Unknown Phoenix
# Fill missing ages with the mean age
mean_age = df['Age'].mean()
df['Age'].fillna(mean_age, inplace=True)
df
## Name Age City
## 0 Alice 25.000000 New York
## 1 Bob 33.333333 Los Angeles
## 2 None 35.000000 Boston
## 3 David 40.000000 Houston
## 4 Eve 33.333333 Phoenix
isnull and notnull methods are used to
detect missing data.dropna can be used to drop rows or columns with missing
data.fillna can be used to fill in the missing data with a
specific value or method.