Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.
The merge() function syntax is:
def merge(
self,
right,
how="inner",
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes=("_x", "_y"),
copy=True,
indicator=False,
validate=None,
)
Let’s look at some examples of merging two DataFrame objects.
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
print('DataFrame 1:\n', df1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print('DataFrame 2:\n', df2)
df_merged = df1.merge(df2)
print('Result:\n', df_merged)
Output:
DataFrame 1:
Name Country Role
0 Pankaj India CEO
1 Meghna India CTO
2 Lisa USA CTO
DataFrame 2:
ID Name
0 1 Pankaj
1 2 Anupam
2 3 Amit
Result:
Name Country Role ID
0 Pankaj India CEO 1
print('Result Left Join:\n', df1.merge(df2, how='left'))
print('Result Right Join:\n', df1.merge(df2, how='right'))
print('Result Outer Join:\n', df1.merge(df2, how='outer'))
Output:
Result Left Join:
Name Country Role ID
0 Pankaj India CEO 1.0
1 Meghna India CTO NaN
2 Lisa USA CTO NaN
Result Right Join:
Name Country Role ID
0 Pankaj India CEO 1
1 Anupam NaN NaN 2
2 Amit NaN NaN 3
Result Outer Join:
Name Country Role ID
0 Pankaj India CEO 1.0
1 Meghna India CTO NaN
2 Lisa USA CTO NaN
3 Anupam NaN NaN 2.0
4 Amit NaN NaN 3.0
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print(df1.merge(df2, on='ID'))
print(df1.merge(df2, on='Name'))
Output:
Name_x ID Country Role Name_y
0 Pankaj 1 India CEO Pankaj
1 Meghna 2 India CTO Anupam
2 Lisa 3 USA CTO Amit
Name ID_x Country Role ID_y
0 Pankaj 1 India CEO 1
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID1': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID2': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print(df1.merge(df2))
print(df1.merge(df2, left_on='ID1', right_on='ID2'))
Output;
Name ID1 Country Role ID2
0 Pankaj 1 India CEO 1
Name_x ID1 Country Role ID2 Name_y
0 Pankaj 1 India CEO 1 Pankaj
1 Meghna 2 India CTO 2 Anupam
2 Lisa 3 USA CTO 3 Amit
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
df_merged = df1.merge(df2)
print('Result Default Merge:\n', df_merged)
df_merged = df1.merge(df2, left_index=True, right_index=True)
print('\nResult Index Merge:\n', df_merged)
Output:
Result Default Merge:
Name Country Role ID
0 Pankaj India CEO 1
Result Index Merge:
Name_x Country Role ID Name_y
0 Pankaj India CEO 1 Pankaj
1 Meghna India CTO 2 Anupam
2 Lisa USA CTO 3 Amit
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.