Grouping and aggregating in pandas on multiple columns

rajanikant-ghate · 26 September 2021 05:44

Suppose you have the following dataset:

You’d like to aggregate to a following dataset group by expiry date:

Find the first open value
Find the last close value (expiry value)
Highest value ever reached
Lowest value ever reached

The simple purpose of this analysis could be to study a distribution to estimate a stop loss and target value for a trade strategy.

First step:
Group by the data frame by column “Expiry”
df.groupby('Expiry')

Second step:
Aggregate each required column corresponding to the available functions .
An example of aggregation dict is {‘col_name’:‘max’}, which will compute the max value of the column as per the group. So the code will be:
df.groupby(‘Expiry’).agg({'Open':'first','Close':'last','High':'max','Low':'min'})

Last step:
Reset index!. It’s not compulsory, but it’s often convenient to reset index and sometimes even rename the columns. As most of the times these datasets would further be sliced for different analysis. Any yeah do you think you’ll need drop=True?

Final code:
df.groupby('Expiry').agg({'Open':'first','Close':'last','High':'max','Low':'min'}).reset_index(drop=True)