Drop Rows/Columns if values are NA in DataFrame
To remove rows/columns of DataFrame based on the NA values in them, call dropna() method on this DataFrame. We may specify parameters like along which axis we drop, and how we do this drop, threshold number of non-NA values to drop, etc.
We can also specify the condition if any or all values are to be considered if NA, for dropping using how
parameter of dropna() method.
In this tutorial, we will learn the syntax of DataFrame.dropna() method and how to use this method to delete or drop rows or columns containing NA.
Syntax
The syntax of pandas DataFrame.dropna() method is
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
where
Parameter | Value | Description |
---|---|---|
axis | {0 or ‘index’, 1 or ‘columns’}, default 0 | Determine if rows or columns which contain missing values are removed. 0, or ‘index’ : Drop rows which contain missing values. 1, or ‘columns’ : Drop columns which contain missing value. Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed. |
how | {‘any’, ‘all’}, default ‘any’ | Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. ‘any’ : If any NA values are present, drop that row or column. ‘all’ : If all values are NA, drop that row or column. |
thresh | int, optional | Require that many non-NA values. |
subset | array-like, optional | Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. |
inplace | bool, default False | If True, do operation inplace and return None. |
Return Value
- DataFrame.
- None if inplace=True.
Examples
Delete Rows of DataFrame if Values are NA
In the following program, we take a DataFrame, and drop rows from this DataFrame if any of the values in that rows are NA.
By default, the parameter axis is 0. Therefore dropna() method by default would delete rows of the DataFrame containing NA.
Example.py
import pandas as pd
import numpy as np
data = {'col_0': [10, 20, np.nan, 40, np.nan], 'col_1': [60, 70, 80, 90, 99]}
df = pd.DataFrame(data)
result = df.dropna()
print(result)
Output
col_0 col_1
0 10.0 60
1 20.0 70
3 40.0 90
Third and fifth row has NA (numpy.nan) value. Therefore those rows have been dropped in the resulting DataFrame.
Delete Columns of DataFrame if Values are NA
In the following program, we take a DataFrame, and drop columns from this DataFrame if any of the values in that columns are NA.
Pass axis=1 to drop columns containing NA values.
Example.py
import pandas as pd
import numpy as np
data = {'col_0': [10, 20, np.nan, 40, np.nan], 'col_1': [60, 70, 80, 90, 99]}
df = pd.DataFrame(data)
result = df.dropna(axis=1)
print(result)
Output
col_1
0 60
1 70
2 80
3 90
4 99
Since the first column has two NA values, it is dropped in the resulting DataFrame.
Delete Rows of DataFrame if All Values are NA
In the following program, we take a DataFrame, and drop rows from this DataFrame only if all of the values in that row are NA.
Pass how='all'
to drop rows containing all NA values.
Example.py
import pandas as pd
import numpy as np
data = {'col_0': [10, 20, np.nan], 'col_1': [np.nan, 50, np.nan]}
df = pd.DataFrame(data)
result = df.dropna(how='all')
print(result)
Output
col_0 col_1
0 10.0 NaN
1 20.0 50.0
Third row has all NA values, and therefore this row is dropped it the resulting DataFrame.
Delete Columns of DataFrame if Threshold Number of Values are non-NA
We can also specify a threshold for number of non-NA values to consider for dropping.
In the following program, we take a DataFrame, and drop columns from this DataFrame only if there are not at least 2 non-NAs.
Pass thresh=2
to drop columns not containing at least 2 non-NA values.
Example.py
import pandas as pd
import numpy as np
data = {'col_0': [10, 20, np.nan, 40], 'col_1': [np.nan, 50, np.nan, 70], 'col_2': [np.nan, 90, np.nan, np.nan]}
df = pd.DataFrame(data)
result = df.dropna(axis=1, thresh=2)
print(result)
Output
col_0 col_1
0 10.0 NaN
1 20.0 50.0
2 NaN NaN
3 40.0 70.0
col_0 and col_1 has at least two non-NA values. col_2 has only one non-NA value. So, col_2 has been dropped in the resulting DataFrame.
Conclusion
In this Pandas Tutorial, we learned the syntax of DataFrame.dropna() method and how to use this method to drop rows or columns based on NA values in them, using pandas DataFrame.dropna() method.