How to sort values in dataframe in python?

vishrut-singhal · 21 May 2021 18:32

Pandas DataFrame.sort()

We can efficiently perform sorting in the DataFrame through different kinds:

By label
By Actual value

Before explaining these two kinds of sorting, first we have to take the dataset for demonstration:

import pandas as pd  
import numpy as np  
  
info=pd.DataFrame(np.random.randn(10,2),index=[1,3,7,2,4,5,9,8,0,6],columns=['col2','col1'])  
print(info)

Output

     col2          col1
1      -0.456763     -0.931156
3       0.242766     -0.793590
7       1.133803      0.454363
2      -0.843520     -0.938268
4      -0.018571     -0.315972
5      -1.951544     -1.300100
9      -0.711499      0.031491
8       1.648080      0.695637
0       2.576250     -0.625171
6      -0.301717      0.879970

In the above DataFrame, the labels and the values are unsorted. So, let’s see how it can be sorted:

By label

The DataFrame can be sorted by using the sort_index() method. It can be done by passing the axis arguments and the order of sorting. The sorting is done on row labels in ascending order by default.

Example

import pandas as pd  
import numpy as np  
info=pd.DataFrame(np.random.randn(10,2),index=[1,2,5,4,8,7,9,3,0,6],columns = ['col4','col3'])  
info2=info.sort_index()  
print(info2)

Output

   col4          col3
0     0.698346      1.897573
1     1.247655     -1.208908
2    -0.469820     -0.546918
3    -0.793445      0.362020
4    -1.184855     -1.596489
5     1.500156      -0.397635
6    -1.239635      -0.255545
7     1.110986      -0.681728
8    -1.797474       0.108840
9     0.063048       1.512421

Order of Sorting
The order of sorting can be controlled by passing the Boolean value to the ascending parameter.

Example:

import pandas as pd  
import numpy as np  
info= pd.DataFrame(np.random.randn(10,2),index=[1,4,7,2,5,3,0,8,9,6],columns = ['col4','col5'])  
  
info_2 = info.sort_index(ascending=False)  
print(info)

Output

    col4          col5
1      0.664336     -1.846533
4     -0.456203     -1.255311
7      0.537063     -0.774384
2     -1.937455      0.257315
5      0.331764     -0.741020
3     -0.082334      0.304390
0     -0.983810     -0.711582
8      0.208479     -1.234640
9      0.656063      0.122720
6      0.347990     -0.410401

Sort the Columns:
We can sort the columns labels by passing the axis argument respected to its values 0 or 1. By default, the axis=0, it sort by row.

Example:

import pandas as pd  
import numpy as np  
   
info = pd.DataFrame(np.random.randn(10,2),index=[1,4,8,2,0,6,7,5,3,9],columns = ['col4','col7'])  
info_2=info.sort_index(axis=1)  
print(info_2)

Output

   col4          col7
1    -0.509367     -1.609514
4    -0.516731      0.397375
8    -0.201157     -0.009864
2     1.440567       1.058436
0     0.955486      -0.009777
6    -1.211133       0.415147
7     0.095644       0.531727
5    -0.881241      -0.871342
3     0.206327       -1.154724
9     1.418127        0.146788

By Actual Value
It is another kind through which sorting can be performed in the DataFrame. Like index sorting, sort_values() is a method for sorting by the values.

It also provides a feature in which we can specify the column name of the DataFrame with which values are to be sorted. It is done by passing the ‘by’ argument.

Example:

import pandas as pd  
import numpy as np  
info = pd.DataFrame({'col1':[7,1,8,3],'col2':[8,12,4,9]})  
info_2 = info.sort_values(by='col2')  
print(info_2)

Output

 col1    col2
2     8       4
0     7       8
3     3       9
1     1       12

In the above output, observe that the values are sorted in col2 only, and the respective col1 value and row index will alter along with col2 . Thus, they look unsorted.

Parameters

columns: Before Sorting, you have to pass an object or the column names.
ascending: A Boolean value is passed that is responsible for sorting in the ascending order. Its default value is True.
axis: 0 or index; 1 or ‘columns’. The default value is 0. It decides whether you sort by index or columns.
inplace: A Boolean value is passed. The default value is false. It will modify any other views on this object and does not create a new instance while sorting the DataFrame.
kind: ‘heapsort’, ‘mergesort’, ‘quicksort’ . It is an optional parameter that is to be applied only when you sort a single column or labels.
na_position: ‘first’, ‘last’ . The ‘first’ puts NaNs at the beginning, while the ‘last’ puts NaNs at the end. Default option last.