One-hot encoding is a method used to convert categorical data into a format that can be provided to machine learning algorithms to do a better job in prediction.
Let’s create an example DataFrame with some categorical features.
import pandas as pd
# Creating example data
data = {'Name': ['John', 'Mike', 'Sara'],
'Gender': ['Male', 'Male', 'Female'],
'Country': ['USA', 'Canada', 'Australia']}
# Create DataFrame
df = pd.DataFrame(data)
# Display the DataFrame
print(df)
## Name Gender Country
## 0 John Male USA
## 1 Mike Male Canada
## 2 Sara Female Australia
We will use Pandas to perform one-hot encoding on the ‘Gender’ and ‘Country’ columns.
# Perform one-hot encoding
encoded_df = pd.get_dummies(df, columns=['Gender', 'Country'])
# Display the encoded DataFrame
print(encoded_df)
## Name Gender_Female ... Country_Canada Country_USA
## 0 John 0 ... 0 1
## 1 Mike 0 ... 1 0
## 2 Sara 1 ... 0 0
##
## [3 rows x 6 columns]
We have successfully performed one-hot encoding on categorical columns using Pandas in Python. This encoded data can now be easily used for training machine learning models.