What if there are too many categorical values?

rajanikant-ghate · 18 July 2022 17:51

What if a column has really a lot of categorical values. Say way above 100. Will you convert this column into features using one hot encoding? The model performance can drastically drop if there are too many inputs to it. [more on this in a different thread]

For this, try using tensorflow’s embedding layer.

Embedding layer converts sparse indexes into a vector.

The pre-built embedding_layer instance can then be added to a Sequential model (e.g. model.add(embedding_layer) ), called in a Functional model (e.g. x = embedding_layer(x) ), or used in a subclassed model.