How can you handle duplicate values in a dataset for a variable in Python?

pranav-kapse-a8e0a157 · 30 August 2020 06:35

iftekar-patel-f1e6bf65 · 5 September 2020 01:45

Checking for duplicates

First of all, you may want to check if you have duplicate records. If you don’t, you may not need the rest of this post at all. This checks if the whole row appears elsewhere with the same values in each column.

df.duplicated()

Getting rid of duplicates

Getting rid of duplicate records is easy. Just use:

df.drop_duplicates()