Pandas DataFrame.dropna()
If your dataset consists of null values, we can use the dropna() function to analyze and drop the rows/columns in the dataset.
Syntax:
DataFrameName.dropna(axis=0, how=‘any’, thresh=None, subset=None, inplace=False)
Parameters:
-
axis : {0 or ‘index’, 1 or ‘columns’}, default value 0
It takes int or string values for rows/columns. The input can be 0 and 1 for the integers and index or columns for the string.- 0, or ‘index’: Drop the rows which contain missing values.
- 1, or ‘columns’: Drop the columns which contain the missing value.
-
how :
It determines if row or column is removed from DataFrame when we have at least one NA or all NA.
It takes a string value of only two kinds (‘any’ or ‘all’).- any: It drops the row/column if any value is null.
- all: It drops only if all values are null.
-
thresh:
It takes integer value that defines the minimum amount of NA values to drop. -
subset:
It is an array that limits the dropping process to passed rows/columns through the list. -
inplace:
It returns a boolean value that makes the changes in data frame itself if it is True.
Returns
It returns the DataFrame from which NA entries has been dropped.
For Demonstration, first, we are taking a csv file that will drop any column from the dataset.
import pandas as pd aa = pd.read_csv("aa.csv") aa.head()
Output
Name | Hire Date | Salary | Leaves Remaining |
---|---|---|---|
0 John Idle 03/15/14 | 50000.0 | 10 | |
1 Smith Gilliam | 06/01/15 | 65000.0 | 8 |
2 Parker Chapman | 05/12/14 | 45000.0 | 10 |
3 Jones Palin | 11/01/13 | 70000.0 | 3 |
4 Terry Gilliam | 08/12/14 | 48000.0 | 7 |
5 Michael Palin | 05/23/13 | 66000.0 | 8 |
Code:
# importing pandas module import pandas as pd # making data frame from csv file info = pd.read_csv("aa.csv") # making a copy of old data frame copy = pd.read_csv("aa.csv") # creating value with all null values in new data frame copy["Null Column"]= None # checking if column is inserted properly print(info.columns.values, "\n", copy.columns.values) # comparing values before dropping null column print("\nColumn number before dropping Null column\n", len(info.dtypes), len(copy.dtypes)) # dropping column with all null values copy.dropna(axis = 1, how ='all', inplace = True) # comparing values after dropping null column print("\nColumn number after dropping Null column\n", len(info.dtypes), len(info.dtypes))
Output
[' Name Hire Date Salary Leaves Remaining'] [' Name Hire Date Salary Leaves Remaining' 'Null Column'] Column number before dropping Null column 1 2 Column number after dropping Null column 1 1