What is categorical data?

chayan-kathuria · 10 October 2021 17:49

Categorical variables are usually represented as ‘strings’ or ‘categories’ and are finite in number. Here are a few examples:

The city where a person lives: Delhi, Mumbai, Ahmedabad, Bangalore, etc.
The department a person works in: Finance, Human resources, IT, Production.
The highest degree a person has: High school, Diploma, Bachelors, Masters, PhD.
The grades of a student: A+, A, B+, B, B- etc.
In the above examples, the variables only have definite possible values. Further, we can see there are two kinds of categorical data-

Ordinal Data: The categories have an inherent order
Nominal Data: The categories do not have an inherent order
In Ordinal data, while encoding, one should retain the information regarding the order in which the category is provided. Like in the above example the highest degree a person possesses, gives vital information about his qualification. The degree is an important feature to decide whether a person is suitable for a post or not.

While encoding Nominal data, we have to consider the presence or absence of a feature. In such a case, no notion of order is present. For example, the city a person lives in. For the data, it is important to retain where a person lives. Here, We do not have any order or sequence. It is equal if a person lives in Delhi or Bangalore.