Pandas DataFrame.sort()
We can efficiently perform sorting in the DataFrame through different kinds:
- By label
- By Actual value
Before explaining these two kinds of sorting, first we have to take the dataset for demonstration:
import pandas as pd import numpy as np info=pd.DataFrame(np.random.randn(10,2),index=[1,3,7,2,4,5,9,8,0,6],columns=['col2','col1']) print(info)
Output
col2 col1
1 -0.456763 -0.931156
3 0.242766 -0.793590
7 1.133803 0.454363
2 -0.843520 -0.938268
4 -0.018571 -0.315972
5 -1.951544 -1.300100
9 -0.711499 0.031491
8 1.648080 0.695637
0 2.576250 -0.625171
6 -0.301717 0.879970
In the above DataFrame, the labels and the values are unsorted. So, let’s see how it can be sorted:
- By label
The DataFrame can be sorted by using the sort_index() method. It can be done by passing the axis arguments and the order of sorting. The sorting is done on row labels in ascending order by default.
Example
import pandas as pd import numpy as np info=pd.DataFrame(np.random.randn(10,2),index=[1,2,5,4,8,7,9,3,0,6],columns = ['col4','col3']) info2=info.sort_index() print(info2)
Output
col4 col3
0 0.698346 1.897573
1 1.247655 -1.208908
2 -0.469820 -0.546918
3 -0.793445 0.362020
4 -1.184855 -1.596489
5 1.500156 -0.397635
6 -1.239635 -0.255545
7 1.110986 -0.681728
8 -1.797474 0.108840
9 0.063048 1.512421
Order of Sorting
The order of sorting can be controlled by passing the Boolean value to the ascending parameter.
Example:
import pandas as pd import numpy as np info= pd.DataFrame(np.random.randn(10,2),index=[1,4,7,2,5,3,0,8,9,6],columns = ['col4','col5']) info_2 = info.sort_index(ascending=False) print(info)
Output
col4 col5
1 0.664336 -1.846533
4 -0.456203 -1.255311
7 0.537063 -0.774384
2 -1.937455 0.257315
5 0.331764 -0.741020
3 -0.082334 0.304390
0 -0.983810 -0.711582
8 0.208479 -1.234640
9 0.656063 0.122720
6 0.347990 -0.410401
Sort the Columns:
We can sort the columns labels by passing the axis argument respected to its values 0 or 1. By default, the axis=0, it sort by row.
Example:
import pandas as pd import numpy as np info = pd.DataFrame(np.random.randn(10,2),index=[1,4,8,2,0,6,7,5,3,9],columns = ['col4','col7']) info_2=info.sort_index(axis=1) print(info_2)
Output
col4 col7
1 -0.509367 -1.609514
4 -0.516731 0.397375
8 -0.201157 -0.009864
2 1.440567 1.058436
0 0.955486 -0.009777
6 -1.211133 0.415147
7 0.095644 0.531727
5 -0.881241 -0.871342
3 0.206327 -1.154724
9 1.418127 0.146788
By Actual Value
It is another kind through which sorting can be performed in the DataFrame. Like index sorting, sort_values() is a method for sorting by the values.
It also provides a feature in which we can specify the column name of the DataFrame with which values are to be sorted. It is done by passing the ‘by’ argument.
Example:
import pandas as pd import numpy as np info = pd.DataFrame({'col1':[7,1,8,3],'col2':[8,12,4,9]}) info_2 = info.sort_values(by='col2') print(info_2)
Output
col1 col2
2 8 4
0 7 8
3 3 9
1 1 12
In the above output, observe that the values are sorted in col2 only, and the respective col1 value and row index will alter along with col2 . Thus, they look unsorted.
Parameters
- columns: Before Sorting, you have to pass an object or the column names.
- ascending: A Boolean value is passed that is responsible for sorting in the ascending order. Its default value is True.
- axis: 0 or index; 1 or ‘columns’. The default value is 0. It decides whether you sort by index or columns.
- inplace: A Boolean value is passed. The default value is false. It will modify any other views on this object and does not create a new instance while sorting the DataFrame.
- kind: ‘heapsort’, ‘mergesort’, ‘quicksort’ . It is an optional parameter that is to be applied only when you sort a single column or labels.
- na_position: ‘first’, ‘last’ . The ‘first’ puts NaNs at the beginning, while the ‘last’ puts NaNs at the end. Default option last.