How to drop duplicates values from DataFrame in Python?

vishrut-singhal · 21 May 2021 17:06

Pandas DataFrame.drop_duplicates()

The drop_duplicates() function performs common data cleaning task that deals with duplicate values in the DataFrame. This method helps in removing duplicate values from the DataFrame.

Syntax

DataFrame.drop_duplicates(subset=None, keep=‘first’, inplace=False)

Parameters

subset: It takes a column or the list of column labels. It considers only certain columns for identifying duplicates. Default value None .
keep: It is used to control how to consider duplicate values. It has three distinct values that are as follows:
- first: It drops the duplicate values except for the first occurrence.
- last: It drops the duplicate values except for the last occurrence.
- False: It drops all the duplicates.
inplace: Returns the boolean value. Default value is False.

If it is true, it removes the rows with duplicate values.

Return

Depending on the arguments passed, it returns the DataFrame with the removal of duplicate rows.

Example

import pandas as pd  
emp = {"Name": ["Parker", "Smith", "William", "Parker"],  
"Age": [21, 32, 29, 21]}  
info = pd.DataFrame(emp)  
print(info)

Output

      Name     Age
0     Parker     21
1     Smith      32
2     William    29
3     Parker     21

import pandas as pd  
emp = {"Name": ["Parker", "Smith", "William", "Parker"],  
"Age": [21, 32, 29, 21]}  
info = pd.DataFrame(emp)  
info = info.drop_duplicates()  
print(info)

Output

   Name    Age
0    Parker    21
1    Smith     32
2    William   29