In this tutorial, we will learn how to combine DataFrames using the merge, join, and concat functions from the Pandas library. These functions allow us to combine DataFrames in different ways, which is an essential part of data analysis.

First, we will import the necessary libraries and create some example DataFrames to work with.

import pandas as pd

# Create example DataFrames
data1 = {
    'key': ['A', 'B', 'C', 'D'],
    'value': [1, 2, 3, 4]
}
df1 = pd.DataFrame(data1)

data2 = {
    'key': ['B', 'D', 'E', 'F'],
    'value': [5, 6, 7, 8]
}
df2 = pd.DataFrame(data2)

print(df1)
##   key  value
## 0   A      1
## 1   B      2
## 2   C      3
## 3   D      4
print(df2)
##   key  value
## 0   B      5
## 1   D      6
## 2   E      7
## 3   F      8

Pandas’ merge function allows us to combine DataFrames based on one or more keys. By default, merge performs an ‘inner join’, which means that only the keys present in both DataFrames will be included in the result.

Use merge when you want to combine DataFrames based on a shared key.

# Merge df1 and df2
merged = df1.merge(df2, on='key')
print(merged)
##   key  value_x  value_y
## 0   B        2        5
## 1   D        4        6

As you can see, the result includes only the keys ‘B’ and ‘D’, which are present in both DataFrames. The values from df1 and df2 are included in the ‘value_x’ and ‘value_y’ columns, respectively.

We can also perform ‘outer’, ‘left’, and ‘right’ joins using the how argument of the merge function.

# Outer join
outer = df1.merge(df2, on='key', how='outer')
print(outer)

# Left join
##   key  value_x  value_y
## 0   A      1.0      NaN
## 1   B      2.0      5.0
## 2   C      3.0      NaN
## 3   D      4.0      6.0
## 4   E      NaN      7.0
## 5   F      NaN      8.0
left = df1.merge(df2, on='key', how='left')
print(left)

# Right join
##   key  value_x  value_y
## 0   A        1      NaN
## 1   B        2      5.0
## 2   C        3      NaN
## 3   D        4      6.0
right = df1.merge(df2, on='key', how='right')
print(right)
##   key  value_x  value_y
## 0   B      2.0        5
## 1   D      4.0        6
## 2   E      NaN        7
## 3   F      NaN        8

Pandas’ join function is similar to merge, but operates on the indices of the DataFrames rather than the columns. By default, it performs a ‘left join’.

Use join when you want to combine DataFrames based on their indices.

# Set 'key' as index in df1 and df2
df1_indexed = df1.set_index('key')
df2_indexed = df2.set_index('key')

# Join df1_indexed and df2_indexed
joined = df1_indexed.join(df2_indexed, lsuffix='_df1', rsuffix='_df2')
print(joined)
##      value_df1  value_df2
## key                      
## A            1        NaN
## B            2        5.0
## C            3        NaN
## D            4        6.0

Pandas’ concat function is used to concatenate two or more DataFrames along a particular axis.

Use concat when you want to combine DataFrames vertically or horizontally.

# Concat df1 and df2
concatenated = pd.concat([df1, df2])
print(concatenated)
##   key  value
## 0   A      1
## 1   B      2
## 2   C      3
## 3   D      4
## 0   B      5
## 1   D      6
## 2   E      7
## 3   F      8

As you can see, concat has combined df1 and df2 into a single DataFrame. The original indices from df1 and df2 have been preserved in the resulting DataFrame.